From llm-externalizer
Use when extracting the SAME structured metadata from many files with a cheap LLM. Trigger with "mass scout", "scan many files for X", "extract structured data from a folder", "classify all my files", "audit thousands of files", "run a fieldset over a codebase", "audit my plugin", "PR review all changed files", "security-scan this repo".
npx claudepluginhub emasoft/emasoft-plugins --plugin llm-externalizer[register | preclassify | estimate | scout | search | search-xjob | get | export | jobs-list | audit-sample | body-get | build-fieldset | propose-fieldset | list-bundled-fieldsets | diff | chain]This skill uses the workspace's default tool permissions.
Bulk LLM-driven structured-output file analysis. Point a cheap model
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
Bulk LLM-driven structured-output file analysis. Point a cheap model
(default qwen/qwen-2.5-7b-instruct) at hundreds-to-millions of files; get
back a queryable SQLite registry of extractions defined by a per-call
dynamic JSON Schema. Every response is forced through that schema via
OpenRouter's response_format: json_schema.
Use when the user wants the SAME shape of metadata from every file. For
free-form prose, use the chat tool instead.
OPENROUTER_API_KEY in env or via userConfig.openrouter_api_key.file_paths[]) and a fieldset (author one with
mass_scout_build_fieldset / mass_scout_propose_fieldset, or pass
bundled:<name> — sets: code-audit, skill-audit, security-audit,
pr-review).reports/ and reports_dev/ in .gitignore.Five phases, one MCP tool per phase, source in
mcp-server/src/mass_scouting/cli.ts.
.gitignore; no_gitignore: true
to override) or takes file_paths[]; hashes + caches every body.budget_usd is
a hard gate. live_context: true queries OpenRouter for the real cap.notifications/progress per file.mass_scout_search (per-job) / mass_scout_search_xjob
(cross-job): regex bypass / FTS5 / structured JSON1 / combined.Follow-on tools: jobs_list, audit_sample, body_get,
build_fieldset, propose_fieldset, diff (compare two jobs),
chain (re-scout a filter-matched subset with a fresh fieldset).
mass_scout writes ONE markdown report under
<main-repo-root>/reports/mass_scouting/<TIMESTAMP>-scout-<slug>.md and
returns the file path plus counts. Hand the path to the user — never
re-print the report. Search/get/export emit JSON or JSONL/CSV.
bundled:<name> over authoring JSON when a shipped set fits.bucket (sourcecode / documentation / …) so scout skips
binaries automatically.mass_scout_search (regex / FTS5 / structured) instead of
audit_sample when you can — search returns matching rows only.json: true + limit_per_job / limit_merged on large queries.HTTP 400 context length exceeded → file > cap. Lower
max_context_pct_scout or set live_context: true.scout failed after N attempts → see mass_scout_skipped table.circuit_tripped=true → ≥5 consecutive failures; investigate first.OPENROUTER_API_KEY → set env / userConfig.Flowchart: troubleshooting.
Trigger phrases: "audit every .ts file under src/ for complexity", "scan all skills for weak triggers", "PR review every changed file", "find every Python module that talks to a database".
Concrete input → output:
mass_scout_estimate { db_path:/tmp/x.db, fields_file:bundled:code-audit,
budget_usd:0.50 }
→ files_eligible=50 est_cost_usd=$0.0015 budget_allowed=true
mass_scout { db_path:/tmp/x.db, fields_file:bundled:code-audit,
job_id:audit-1, source_root:/path/src }
→ files_ok=50 files_failed=0 cost_usd=$0.0014
report=<main-root>/reports/mass_scouting/<TS>-scout-audit-1.md
End-to-end: worked-example.
mcp-server/src/mass_scouting/, mcp-server/fieldsets/.