Normalize mixed inputs like code, docs, PDFs, screenshots, diagrams, audio, and transcripts into a structured corpus. Use when the task depends on combining multiple artifact types before analysis or retrieval.
npx claudepluginhub v1truv1us/ai-eng-system --plugin ai-eng-learningThis skill uses the workspace's default tool permissions.
Mixed corpora break down when everything is treated like plain text. Ingest code, prose, visuals, and transcripts according to what each artifact can actually tell you, then normalize them into one corpus with provenance intact.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Mixed corpora break down when everything is treated like plain text. Ingest code, prose, visuals, and transcripts according to what each artifact can actually tell you, then normalize them into one corpus with provenance intact.
Use deterministic extraction first:
Extract concepts and claims:
Extract labeled structure, not generic captions:
Transcribe first, then treat the transcript as prose:
List inputs by type and extraction mode:
Do not start with one giant prompt containing every artifact.
Create one record format for every source:
{
"id": "doc:adr-001",
"kind": "markdown",
"path": "docs/decisions/2026-01-15-auth.md",
"title": "Auth ADR",
"summary": "Why session tokens were chosen",
"entities": ["session token", "refresh token"],
"evidence": ["section: Decision", "section: Tradeoffs"]
}
Always preserve:
Favor the smallest correct extraction:
Avoid turning every source into flat chunks with no type information.
Every normalized record should answer:
Before large ingests, define limits:
| Source | First Pass | Second Pass |
|---|---|---|
| Code | AST or regex structure | semantic labeling |
| Markdown or docs | headings and sections | entity extraction |
| text extraction | concept extraction | |
| Diagram | OCR and labels | relationship extraction |
| Audio or video | transcription | concept extraction |
| Rationalization | Reality |
|---|---|
| "Just chunk everything" | Flattening loses structure, modality, and provenance. |
| "Images are optional context" | Architecture and intent often live only in screenshots or diagrams. |
| "One extraction pass is enough" | Different sources need different extraction methods. |