Skill

nexus

Assesses AI system type (RAG, agents, prompts, LLM evaluation) and applies patterns for hallucination guards, context budgets, injection defenses, and cost tracking.

OpenAI

Hugging Face

Langchain

ai-ml

npx claudepluginhub gadaalabs/claude-code-on-steroids

Tool Access

This skill uses the workspace's default tool permissions.

Preview

**NEXUS** — *A nexus is the central point where all connections converge.*

Supporting Assets

patterns/agent-patterns.mdpatterns/llm-evaluation.mdpatterns/prompt-engineering.mdpatterns/rag-architecture.md

SKILL.md

Similar Skills

using-git-worktrees

169.2k

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

subagent-driven-development

169.2k

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

3 files

superpowers

dispatching-parallel-agents

169.2k

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

Stats

Stars46

Forks8

Last CommitApr 12, 2026

Actions

View Source View Plugin View on GitHub View README

AI Engineering Patterns

Overview

NEXUS — A nexus is the central point where all connections converge. When invoked: assesses system type (RAG / agent / prompt / evaluation), loads the relevant pattern file, and applies AI-specific engineering discipline — hallucination guards, context budgets, injection defenses, cost tracking.

Core principle: LLM applications have unique failure modes — hallucination, prompt injection, context overflow, cost explosion. Engineer systems, not just prompts.

Announce at start: "Running NEXUS for AI application patterns."

Entry Point — First 5 Minutes

SYSTEM TYPE ASSESSMENT:

"What are you building/debugging?"
A) RAG / knowledge retrieval system
B) Autonomous agent / tool-using agent
C) Prompt engineering / LLM integration
D) LLM evaluation / benchmarking
E) Multi-agent system
F) Debugging a hallucination / quality problem
G) Cost/latency optimization

Type → Section mapping:

A → RAG Architecture (see patterns/rag-architecture.md)
B → Agent Design Patterns: ReAct or Plan-Execute (see patterns/agent-patterns.md)
C → Prompt Engineering Workflow (see patterns/prompt-engineering.md)
D → LLM Evaluation Framework (see patterns/llm-evaluation.md)
E → Agent Patterns: Multi-Agent Debate or Tool Routing
F → Hallucination Detection + run hunter
G → Latency/Cost Tracking + vector skill

After identifying type, ask: "What model are you using and what's the context window limit?"

RAG Architecture

Load patterns: patterns/rag-architecture.md

Key decisions in order:

Chunking strategy — fixed / semantic / recursive / code-aware
Embedding model — MiniLM (fast) vs bge-large (quality) vs text-embedding-3-large (enterprise)
Retrieval method — lexical (BM25) / semantic / hybrid (RRF) / MMR
Re-ranking — cross-encoder reranker from top-20 → top-5
Context budget — allocate: system 500 + query 100 + context 6000 + output 1592

Rule: Test retrieval quality (precision/recall) before testing generation quality.

Agent Design Patterns

Load patterns: patterns/agent-patterns.md

Pattern	Best For	Iteration Limit
ReAct	Factual QA, tool use	10
Plan-Execute	Multi-step tasks	5 plans
Reflection	Quality-critical output	3 cycles
Multi-Agent Debate	High-stakes decisions	3 rounds
Tool Routing	Multiple specialized tools	N/A

Always set max iteration limits. Agents without limits will loop indefinitely on failure.

Prompt Engineering Workflow

Load patterns: patterns/prompt-engineering.md

Process:

Write initial prompt
Test on 10 diverse examples — categorize failure modes
Add constraints / few-shot examples to address failures
Re-test until >90% success rate
Add output format schema + retry logic (max 3 retries)
Add injection defenses (keyword filter + context isolation)

Rule: Never deploy a prompt tested on fewer than 10 diverse examples.

LLM Evaluation Framework

Load patterns: patterns/llm-evaluation.md

Metric type	Method	Use when
Exact match	string equality	Factual QA, code gen
F1 score	token overlap	Extractive QA
Semantic similarity	cosine >0.8	Open-ended QA
Rubric-based	LLM grades	Complex tasks
Hallucination	fact verification + self-consistency	High-stakes output
Human preference	blind A/B, win rate >0.55	Model comparison

Cost budgets: p99 latency <5s, cost per 1k requests <$10.

Red Flags

Never:

Deploy RAG without testing retrieval quality
Use agents without max iteration limits
Skip prompt injection testing
Deploy without hallucination monitoring
Ignore cost monitoring (can explode quickly)

Always:

Test prompts on diverse examples before deploying
Include few-shot examples for complex tasks
Validate output format with schema + retry
Log all prompts and responses for debugging
Set up cost alerts before production deployment

Integration with Superpowers

Skill	Integration
`forge`	Write eval tests before model/prompt changes
`hunter`	Debug hallucination, retrieval failures
`sentinel`	Verify eval metrics before claiming success
`chronicle`	Store prompt patterns that worked
`vector`	Route queries to appropriate model tier

Final Checklist

RAG retrieval quality tested (precision/recall)
Agent has max iteration limits
Prompt injection testing passed
Hallucination detection configured
Cost monitoring and alerts set up
Latency budgets defined and met
Output format validation in place
Human evaluation plan defined
Rollback plan if model degrades