Skill

setup-local-llm

Audit what local LLM runtimes are installed (Ollama, llama.cpp, vLLM, LM Studio) and suggest/install a model suitable for corpus analysis tasks — classification, labeling, summarization. Use when a skill in this plugin wants a local-LLM lane but nothing is available, or when the user wants to move a cloud workload local for cost/privacy.

npx claudepluginhub danielrosehill/claude-code-plugins --plugin text-corpus-analysis

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Get a local model ready for text-corpus work.

SKILL.md

Similar Skills

ui-ux-pro-max

72.7k

Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.

ui-ux-pro-max

context7-mcp

51.8k

Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.

context7-plugin

seo-cannibalization-detector

36.5k

Analyzes multiple pages for keyword overlap, SEO cannibalization risks, and content duplication. Suggests differentiation, consolidation, and resolution strategies when reviewing similar content.

antigravity-bundle-seo-specialist

Stats

Stars0

Forks0

Last CommitApr 23, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

setup-local-llm

Setup Local LLM

Get a local model ready for text-corpus work.

Procedure

Audit: check for ollama, llama.cpp, vllm, LM Studio, llamafile. Check GPU (nvidia-smi, rocm-smi) for VRAM budget.

Recommend by task profile:

Task profile	Suggested models
Classification / labeling into known categories (1-3k tokens, short answer)	`llama3.1:8b`, `qwen2.5:7b`, `gemma2:9b` — 8-16GB VRAM
Semantic judgment on longer docs	`qwen2.5:14b`, `llama3.1:70b` (quantized) — 24GB+ VRAM
Embedding generation	`nomic-embed-text`, `mxbai-embed-large` via Ollama, or `all-MiniLM-L6-v2` via sentence-transformers (CPU-fast)
Topic labeling (one-off, 20-50 calls)	Any 7B+ instruct model

Install (Ollama preferred):

ollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text

Verify with a test prompt, measure throughput (tokens/sec). For a 10k-doc classification run, calculate wall-clock: (N docs × ~300 output tokens) / throughput. If >8 hours, consider cloud instead.

Expose via Ollama's OpenAI-compatible endpoint at http://localhost:11434/v1 — skills in this plugin can hit it with the OpenAI SDK.

When local isn't worth it

Corpus <500 docs and user has OpenRouter key → cloud is faster and pennies.

Model quality matters (nuanced labeling, taxonomy judgment) → frontier cloud models outperform anything you can run on consumer hardware.

Reference

See docs/local-llm-sizing.md if present for VRAM/throughput guidance.

Setup Local LLM

Get a local model ready for text-corpus work.

Procedure

Audit: check for ollama, llama.cpp, vllm, LM Studio, llamafile. Check GPU (nvidia-smi, rocm-smi) for VRAM budget.

Recommend by task profile:

Task profile	Suggested models
Classification / labeling into known categories (1-3k tokens, short answer)	`llama3.1:8b`, `qwen2.5:7b`, `gemma2:9b` — 8-16GB VRAM
Semantic judgment on longer docs	`qwen2.5:14b`, `llama3.1:70b` (quantized) — 24GB+ VRAM
Embedding generation	`nomic-embed-text`, `mxbai-embed-large` via Ollama, or `all-MiniLM-L6-v2` via sentence-transformers (CPU-fast)
Topic labeling (one-off, 20-50 calls)	Any 7B+ instruct model

Install (Ollama preferred):

ollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text

Verify with a test prompt, measure throughput (tokens/sec). For a 10k-doc classification run, calculate wall-clock: (N docs × ~300 output tokens) / throughput. If >8 hours, consider cloud instead.
Expose via Ollama's OpenAI-compatible endpoint at http://localhost:11434/v1 — skills in this plugin can hit it with the OpenAI SDK.

When local isn't worth it

Corpus <500 docs and user has OpenRouter key → cloud is faster and pennies.
Model quality matters (nuanced labeling, taxonomy judgment) → frontier cloud models outperform anything you can run on consumer hardware.

Reference

See docs/local-llm-sizing.md if present for VRAM/throughput guidance.