By orq-ai
Agent skills for building, deploying, evaluating, and monitoring LLM pipelines on the orq.ai platform.
npx claudepluginhub orq-ai/assistant-pluginsShow workspace analytics — requests, cost, tokens, errors, top models, and drill-down trends
List available AI models and their capabilities
Interactive onboarding guide — set up credentials, connect to orq.ai, and learn every command and skill
Query and summarize traces with filters — debugging entry point before analyze-trace-failures
Show a workspace overview — agents, deployments, prompts, datasets, experiments, projects, and knowledge bases
Read production traces, identify what's failing, and build failure taxonomies using open coding and axial coding methodology. Use when debugging agent or pipeline quality, investigating "why are my outputs bad?", or before building any evaluator — error analysis must come first. Do NOT use when you already have identified failure modes and need evaluators (use build-evaluator) or datasets (use generate-synthetic-dataset).
Design, create, and configure orq.ai Agents with tools, instructions, knowledge bases, and memory stores. Use when building new agents, attaching KBs or memory, writing system instructions, selecting models, or setting up RAG pipelines. Do NOT use for debugging existing agents (use analyze-trace-failures) or comparing agents across frameworks (use compare-agents).
Create validated LLM-as-a-Judge evaluators following best practices — binary Pass/Fail judges with TPR/TNR validation for measuring specific failure modes. Use when you need to automate quality checks, build guardrails, or measure a specific failure mode identified during trace analysis. Do NOT use when failures are fixable with prompt changes (use optimize-prompt) or when failure modes are unknown (use analyze-trace-failures first).
Run cross-framework agent comparisons using evaluatorq from orqkit — compares any combination of agents (orq.ai, LangGraph, CrewAI, OpenAI Agents SDK, Vercel AI SDK) head-to-head on the same dataset with LLM-as-a-judge scoring. Use when comparing agents, benchmarking, or wanting side-by-side evaluation. Do NOT use when comparing only orq.ai configurations with no external agents (use run-experiment instead).
Generate and curate evaluation datasets — structured generation via dimensions-tuples-NL, quick from description, expansion from existing data, plus dataset maintenance through deduplication, rebalancing, and gap-filling. Use when creating eval data, expanding test coverage, or cleaning datasets. Do NOT use when sufficient real production data exists (use analyze-trace-failures instead). Do NOT use for evaluator creation (use build-evaluator).
Invoke orq.ai deployments, agents, and models via the Python SDK or HTTP API. Use when a user wants to call a deployment with prompt variables, invoke an agent in a conversation, or call a model directly through the AI Router. Do NOT use for creating or editing deployments/agents (use optimize-prompt or build-agent). Do NOT use for running evaluations (use run-experiment).
Analyze and optimize system prompts using a structured prompting guidelines framework — AI-powered analysis and rewriting. Use when a prompt needs improvement, experiment results show quality gaps, or you want a structured review of an existing system prompt. Do NOT use when production traces show failures (use analyze-trace-failures first to identify patterns). Do NOT use to build evaluators (use build-evaluator).
Create and run orq.ai experiments — compare configurations against datasets using evaluators, analyze results, and generate prioritized action plans. Use when evaluating LLM agents, deployments, conversations, or RAG pipelines end-to-end. Do NOT use without a dataset and evaluators. Do NOT use for cross-framework comparisons with external agents (use compare-agents).
Set up orq.ai observability for LLM applications. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.
Team-oriented workflow plugin with role agents, 27 specialist agents, ECC-inspired commands, layered rules, and hooks skeleton.
External network access
Connects to servers outside your machine
Uses power tools
Uses Bash, Write, or Edit tools
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Semantic search for Claude Code conversations. Remember past discussions, decisions, and patterns.
Comprehensive startup business analysis with market sizing (TAM/SAM/SOM), financial modeling, team planning, and strategic research
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.