This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering/plugin install muratcankoylan-example-skills@muratcankoylan/Agent-Skills-for-Context-EngineeringThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/case-studies.mdreferences/pipeline-patterns.mdscripts/pipeline_template.pyThis skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application.
Activate this skill when:
Not every problem benefits from LLM processing. The first step in any project is evaluating whether the task characteristics align with LLM strengths. This evaluation should happen before writing any code.
LLM-suited tasks share these characteristics:
| Characteristic | Why It Fits |
|---|---|
| Synthesis across sources | LLMs excel at combining information from multiple inputs |
| Subjective judgment with rubrics | LLMs handle grading, evaluation, and classification with criteria |
| Natural language output | When the goal is human-readable text, not structured data |
| Error tolerance | Individual failures do not break the overall system |
| Batch processing | No conversational state required between items |
| Domain knowledge in training | The model already has relevant context |
LLM-unsuited tasks share these characteristics:
| Characteristic | Why It Fails |
|---|---|
| Precise computation | Math, counting, and exact algorithms are unreliable |
| Real-time requirements | LLM latency is too high for sub-second responses |
| Perfect accuracy requirements | Hallucination risk makes 100% accuracy impossible |
| Proprietary data dependence | The model lacks necessary context |
| Sequential dependencies | Each step depends heavily on the previous result |
| Deterministic output requirements | Same input must produce identical output |
The evaluation should happen through manual prototyping: take one representative example and test it directly with the target model before building any automation.
Before investing in automation, validate task-model fit with a manual test. Copy one representative input into the model interface. Evaluate the output quality. This takes minutes and prevents hours of wasted development.
This validation answers critical questions:
If the manual prototype fails, the automated system will fail. If it succeeds, you have a baseline for comparison and a template for prompt design.
LLM projects benefit from staged pipeline architectures where each stage is:
The canonical pipeline structure:
acquire → prepare → process → parse → render
Stages 1, 2, 4, and 5 are deterministic. Stage 3 is non-deterministic and expensive. This separation allows re-running the expensive LLM stage only when necessary, while iterating quickly on parsing and rendering.
Use the file system to track pipeline state rather than databases or in-memory structures. Each processing unit gets a directory. Each stage completion is marked by file existence.
data/{id}/
├── raw.json # acquire stage complete
├── prompt.md # prepare stage complete
├── response.md # process stage complete
├── parsed.json # parse stage complete
To check if an item needs processing: check if the output file exists. To re-run a stage: delete its output file and downstream files. To debug: read the intermediate files directly.
This pattern provides:
When LLM outputs must be parsed programmatically, prompt design directly determines parsing reliability. The prompt must specify exact format requirements with examples.
Effective structure specification includes:
Example prompt structure:
Analyze the following and provide your response in exactly this format:
## Summary
[Your summary here]
## Score
Rating: [1-10]
## Details
- Key point 1
- Key point 2
Follow this format exactly because I will be parsing it programmatically.
The parsing code must handle variations gracefully. LLMs do not follow instructions perfectly. Build parsers that:
Modern agent-capable models can accelerate development significantly. The pattern is:
This is about rapid iteration: generate, test, fix, repeat. The agent handles boilerplate and initial structure while you focus on domain-specific requirements and edge cases.
Key practices for effective agent-assisted development:
LLM processing has predictable costs that should be estimated before starting. The formula:
Total cost = (items × tokens_per_item × price_per_token) + API overhead
For batch processing:
Track actual costs during development. If costs exceed estimates significantly, re-evaluate the approach. Consider:
Single-agent pipelines work for:
Multi-agent architectures work for:
The primary reason for multi-agent is context isolation, not role anthropomorphization. Sub-agents get fresh context windows for focused subtasks. This prevents context degradation on long-running tasks.
See multi-agent-patterns skill for detailed architecture guidance.
Start with minimal architecture. Add complexity only when proven necessary. Production evidence shows that removing specialized tools often improves performance.
Vercel's d0 agent achieved 100% success rate (up from 80%) by reducing from 17 specialized tools to 2 primitives: bash command execution and SQL. The file system agent pattern uses standard Unix utilities (grep, cat, find, ls) instead of custom exploration tools.
When reduction outperforms complexity:
When complexity is necessary:
See tool-design skill for detailed tool architecture guidance.
Expect to refactor. Production agent systems at scale require multiple architectural iterations. Manus refactored their agent framework five times since launch. The Bitter Lesson suggests that structures added for current model limitations become constraints as models improve.
Build for change:
Task Analysis
Manual Validation
Architecture Selection
Cost Estimation
Development Plan
Skipping manual validation: Building automation before verifying the model can do the task wastes significant time when the approach is fundamentally flawed.
Monolithic pipelines: Combining all stages into one script makes debugging and iteration difficult. Separate stages with persistent intermediate outputs.
Over-constraining the model: Adding guardrails, pre-filtering, and validation logic that the model could handle on its own. Test whether your scaffolding helps or hurts.
Ignoring costs until production: Token costs compound quickly at scale. Estimate and track from the beginning.
Perfect parsing requirements: Expecting LLMs to follow format instructions perfectly. Build robust parsers that handle variations.
Premature optimization: Adding caching, parallelization, and optimization before the basic pipeline works correctly.
Example 1: Batch Analysis Pipeline (Karpathy's HN Time Capsule)
Task: Analyze 930 HN discussions from 10 years ago with hindsight grading.
Architecture:
Results: $58 total cost, ~1 hour execution, static HTML output.
Example 2: Architectural Reduction (Vercel d0)
Task: Text-to-SQL agent for internal analytics.
Before: 17 specialized tools, 80% success rate, 274s average execution.
After: 2 tools (bash + SQL), 100% success rate, 77s average execution.
Key insight: The semantic layer was already good documentation. Claude just needed access to read files directly.
See Case Studies for detailed analysis.
This skill connects to:
Internal references:
Related skills in this collection:
External resources:
Created: 2025-12-25 Last Updated: 2025-12-25 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.