From text-corpus-analysis
Derive N categories from the dominant themes of a corpus — the user says "give me 10 categories for these 1000 notes" or "propose 20 labels that would cover most of this data". Produces a proposed category list with definitions, coverage estimates, and example documents.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin text-corpus-analysisThis skill uses the workspace's default tool permissions.
Bottom-up category derivation. Given a corpus and a target *N*, produce a workable categorization scheme.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Analyzes multiple pages for keyword overlap, SEO cannibalization risks, and content duplication. Suggests differentiation, consolidation, and resolution strategies when reviewing similar content.
Share bugs, ideas, or general feedback.
Bottom-up category derivation. Given a corpus and a target N, produce a workable categorization scheme.
choose-approach recommends this):
all-MiniLM-L6-v2 (local, free) or text-embedding-3-small (cloud, cheap).k=N if exact count is required, or BERTopic for topic-word output.{label, 1-sentence definition, distinguishing examples}.categorize-corpus on the sample with a confidence threshold. Anything that lands in "none" or "low-confidence" tells you where the scheme has gaps.{
"proposed_categories": [
{
"label": "Infrastructure & DevOps",
"definition": "Notes about servers, containers, CI/CD, deployment, monitoring.",
"estimated_share": 0.14,
"exemplars": ["doc_id_123", "doc_id_456"],
"top_terms": ["docker", "k8s", "deploy", ...]
}
],
"uncovered_share": 0.07,
"notes": "..."
}
categorize-corpus with current N categories.