From text-corpus-analysis
Identify tokens or phrases that refer to the same concept but appear in different forms — transcription variants from voice notes, spelling variants, acronyms vs expansions, aliases. Use on any voice-note or STT-derived corpus before frequency/NER/topic work, or to deduplicate entity lists.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin text-corpus-analysisThis skill uses the workspace's default tool permissions.
Collapse variant surface forms to a canonical concept.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Analyzes multiple pages for keyword overlap, SEO cannibalization risks, and content duplication. Suggests differentiation, consolidation, and resolution strategies when reviewing similar content.
Share bugs, ideas, or general feedback.
Collapse variant surface forms to a canonical concept.
word-frequency / ner-extraction. Keep surface counts.token_sort_ratio, Jaro-Winkler, or Levenshtein normalized by length. Fast, handles typos and transcription slips well.[
{"canonical": "GitHub", "variants": ["git hub", "get hub", "gitbub"], "total_count": 1423},
{"canonical": "large language model", "variants": ["LLM", "LLMs"], "total_count": 892}
]
re.sub with word boundaries), or keep as a lookup for downstream skills.No LLM needed for the clustering pass itself. Embeddings (if used) are a one-time cost on the unique-vocabulary set — typically 5-50k unique tokens, not N docs. Cheap.