Skill

choose-approach

Decide whether a corpus analysis task should use classical NLP, a local LLM, or a cloud LLM (OpenRouter) given corpus size, task complexity, and cost tolerance. Use first, before any other skill in this plugin, especially when the corpus is large (thousands+ of documents) or when an LLM pass could get expensive.

npx claudepluginhub danielrosehill/claude-code-plugins --plugin text-corpus-analysis

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Decision helper. Picks the right execution lane for a text-analysis task.

SKILL.md

Similar Skills

ui-ux-pro-max

72.7k

Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.

ui-ux-pro-max

context7-mcp

51.8k

Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.

context7-plugin

seo-cannibalization-detector

36.5k

Analyzes multiple pages for keyword overlap, SEO cannibalization risks, and content duplication. Suggests differentiation, consolidation, and resolution strategies when reviewing similar content.

antigravity-bundle-seo-specialist

Stats

Stars0

Forks0

Last CommitApr 23, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Choose Approach

Decision helper. Picks the right execution lane for a text-analysis task.

When to use

Before any skill in this plugin is applied to more than a few hundred documents.
When the user says "analyze all my notes", "categorize everything", "run NER on the corpus" — i.e. any whole-corpus pass.
When cost could exceed a few dollars if done naively.

The three lanes

Lane	Strengths	Weaknesses	Cost model
Classical NLP (spaCy, scikit-learn, gensim, TextBlob, BlackLab)	Deterministic, fast, free, scales to millions of docs, strong for frequency/NER/TF-IDF/LDA	Brittle on short or noisy text; no semantic judgment	CPU time only
Local LLM (Ollama: llama3.1, qwen2.5, gemma2)	No per-token cost, private, good for classification/labeling on small batches	Slow on big corpora (GPU-bound), lower ceiling than frontier models	Electricity + wall-clock time
Cloud LLM (OpenRouter → Claude, GPT, Gemini, DeepSeek, Llama)	Best judgment, handles nuance, parallelizable	Per-token cost — dangerous on 10k+ docs without planning	$ per M tokens

Procedure

Size the corpus: count documents, estimate avg tokens/doc (approx words × 1.3), compute total input tokens.
Classify the task as one of:
- Mechanical (frequency, length, NER with known list, metadata corr) → Classical NLP
- Semantic judgment (categorize, taxonomy, synonym-cluster, topic-label) → LLM (local or cloud)
- Hybrid (candidate extraction + labeling) → Classical NLP first, LLM on reduced set
If LLM: estimate cost at three tiers using current OpenRouter pricing (always fetch live — prices change):
- Cheap: DeepSeek V3 / Gemini Flash / Llama 3.3 70B (~$0.05-0.30 per M input tokens)
- Mid: Claude Haiku, GPT-4o-mini (~$0.25-1.00 per M)
- Premium: Claude Sonnet/Opus, GPT-4o (~$3-15 per M)
Apply cost-reduction tactics before running:
- Sample, don't sweep — 500 stratified docs usually derive the same taxonomy as 50,000.
- Two-pass: cheap model labels everything, premium model re-labels only the low-confidence residue.
- Map-reduce: summarize per-doc cheaply, then cluster summaries.
- Embeddings + clustering instead of LLM for grouping: one embedding call per doc (very cheap) + k-means is orders of magnitude cheaper than asking an LLM to compare docs.
- Prompt caching (Anthropic, OpenAI): if the same system prompt + taxonomy ships with every call, cache it.
Present the user with: recommended lane, estimated cost range, and at least one cheaper fallback. Ask for go-ahead before any run >$2.

Output shape

Task: <restatement>
Corpus: N docs, ~M tokens total
Recommended lane: <Classical NLP | Local LLM | Cloud LLM via OpenRouter>
Recommended model (if LLM): <name> — est. $X.XX for full corpus
Cheaper fallback: <sampling strategy OR cheaper model> — est. $Y.YY
Reasoning: <2-3 sentences>

Hard rules

Never launch an LLM pass over >1k documents without showing a cost estimate first.
Never use a premium model for a task a cheap model can do (labeling into known categories, routing).
For pure mechanical tasks, never suggest an LLM. spaCy/regex is the right answer.