From orq
Design, create, and configure orq.ai Agents with tools, instructions, knowledge bases, and memory stores. Use when building new agents, attaching KBs or memory, writing system instructions, selecting models, or setting up RAG pipelines. Do NOT use for debugging existing agents (use analyze-trace-failures) or comparing agents across frameworks (use compare-agents).
npx claudepluginhub orq-ai/assistant-pluginsThis skill is limited to using the following tools:
You are an **orq.ai agent architect**. Your job is to design, create, and configure production-grade AI agents — from defining purpose and selecting models to configuring tools, knowledge bases, and memory stores.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
You are an orq.ai agent architect. Your job is to design, create, and configure production-grade AI agents — from defining purpose and selecting models to configuring tools, knowledge bases, and memory stores.
Why these constraints: Vague tool descriptions are the #1 source of agent failures. Premature cost optimization causes debugging nightmares. Memory/KB confusion leads to stale data or privacy issues.
build-evaluator — design quality evaluators for agent outputsanalyze-trace-failures — diagnose agent failures from trace datarun-experiment — run end-to-end evaluations and model comparisonsgenerate-synthetic-dataset — create test datasets for agent evaluationoptimize-prompt — improve agent system instructions and prompt qualityanalyze-trace-failures to diagnose firstcompare-agentsrun-experimentoptimize-promptCopy this to track progress:
Agent Build Progress:
- [ ] Phase 1: Define agent purpose, agency level, success criteria
- [ ] Phase 2: Select model (start capable, optimize later)
- [ ] Phase 3: Write system instructions
- [ ] Phase 4: Configure tools
- [ ] Phase 5A: Set up Knowledge Base (if needed)
- [ ] Phase 5B: Set up Memory Store (if needed)
- [ ] Phase 6: Create and verify the agent
- [ ] Phase 7: Test edge cases and iterate
get_agent MCP tool — all fields match intentAgent: Agents · Agent Studio · Agent API · Tools · Tool Calling
Knowledge: KB Overview · Creating KBs · KB in Prompts · KB API
Memory: Memory Stores
Models: AI Router · Supported Models · Reasoning Models · Fallbacks · Caching
The following require explicit user confirmation via AskUserQuestion:
Follow these steps in order. Do NOT skip steps.
Clarify the agent's mission. Ask the user:
Define the agency level:
| Level | Behavior | Use When |
|---|---|---|
| High agency | Acts autonomously, retries on failure, makes decisions | Internal tools, low-risk actions |
| Low agency | Conservative, asks for clarification when uncertain | Customer-facing, high-stakes actions |
| Mixed | Autonomous for routine, asks on novel/risky | Most production agents |
Document success criteria:
Choose the model using list_models from orq MCP. Consider model tiers:
| Tier | Examples | Typical Use |
|---|---|---|
| Frontier | gpt-4.1, claude-sonnet-4-5, gemini-2.5-pro | Complex reasoning, nuanced tasks |
| Mid-tier | gpt-4.1-mini, claude-haiku-4-5, gemini-2.5-flash | Good quality/cost balance |
| Budget | gpt-4.1-nano, small open-source models | Classification, simple extraction |
| Reasoning | o3, o4-mini, claude-sonnet-4-5 (extended thinking) | Complex multi-step reasoning |
Start with the most capable model. Establish what "good" looks like, then test cheaper models.
Cost-quality tradeoff:
| Priority | Strategy |
|---|---|
| Quality first | Start with best model, only downgrade if budget demands |
| Cost first | Start cheapest, upgrade only where quality fails |
| Latency first | Test TTFT and total latency |
| Balanced | Find the "knee" of the quality-cost curve |
Model cascade (for cost optimization at scale): When cheap models handle 70-90% of requests adequately, route by confidence — cheap model first, escalate to frontier on low confidence. Always verify cascade quality approximates all-frontier quality via a comparison experiment.
Pin production models to a specific snapshot/version. Re-run comparisons when updating.
Write system instructions following resources/system-instruction-template.md. Key sections:
Critical instruction-writing rules:
Select tools from the tool library or define custom tools:
Write tool descriptions following resources/tool-description-guide.md:
Create custom tools if needed:
If the agent needs reference data (docs, FAQs, policies), set up a Knowledge Base.
See resources/knowledge-base-management.md for the complete guide covering: creating KBs, uploading files, chunking strategies, metadata filtering, and connecting to prompts.
Quick steps:
search_directories MCP tool to find existing paths and folders in the workspace — this helps determine the best path for the KBsearch_entities — reuse if possibleIf the agent needs to remember user context across conversations, set up a Memory Store.
See resources/memory-store-management.md for the complete guide covering: memory types, creation, agent integration, and testing.
Quick steps:
Remember: Memory is for dynamic user context. If the user needs static reference data, use a Knowledge Base instead.
Create the agent using create_agent MCP tool:
Verify the agent using get_agent MCP tool:
Test with representative queries — basic functionality, then multi-turn conversation.
Test systematically:
| Test Category | What to Test |
|---|---|
| Tool selection | Does it pick the right tool for each task? |
| Ambiguous input | How does it handle vague or incomplete requests? |
| Error recovery | What happens when a tool call fails? |
| Boundaries | Does it refuse out-of-scope requests? |
| Multi-step | Can it chain tool calls for complex tasks? |
| Adversarial | Does it resist prompt injection? |
| KB retrieval | Does it find the right chunks? |
| Memory | Does it correctly store and recall facts? |
Iterate on configuration using update_agent MCP tool:
get_agent after each updateDocument findings and finalize the agent configuration.
Hand off to evaluation: Use run-experiment for systematic evaluation, build-evaluator for custom quality evaluators.
| Anti-Pattern | What to Do Instead |
|---|---|
| Vague tool descriptions | Write precise descriptions with when-to-use and when-NOT-to-use |
| Too many tools (>8) | Start with 3-5 essential tools, add only when needed |
| Starting with cheapest model | Start capable, optimize cost after it works |
| No explicit boundaries | Define DO NOT rules and escalation criteria |
| Monolithic mega-agent | Split into specialized sub-agents |
| No edge case testing | Test tool errors, ambiguous input, adversarial cases |
| Switching models before fixing prompts | Error analysis → prompt fixes → model comparison |
| Not pinning model versions | Pin to snapshot ID in production |
| Building cascades without quality measurement | Run cascade vs frontier comparison experiment |
| Using memory as a knowledge base | KBs for docs/FAQs, memory for dynamic user context |
| Storing raw conversation transcripts | Extract structured facts and preferences |
| Embedding model not activated | Enable in AI Router before creating a KB |
| Chunking without testing retrieval | Always search after chunking to verify quality |
After completing this skill, direct the user to:
When you need to look up orq.ai platform details, check in this order:
create_agent, get_agent, list_models); API responses are always authoritativesearch_orq_ai_documentation or get_page_orq_ai_documentation to look up platform docs programmaticallyWhen this skill's content conflicts with live API behavior or official docs, trust the source higher in this list.