npx claudepluginhub arbazkhan971/godmodeThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
/godmode:rag, "build RAG system", "knowledge base"Use case: <questions the system must answer>
Data sources: <docs, wiki, DB, PDFs, code>
Corpus: <N documents, N tokens, N MB>
Update frequency: static|daily|real-time
Query patterns:
Factual lookup (single-hop retrieval)
Analytical (multi-document retrieval)
Conversational (multi-turn Q&A)
Structured (metadata filtering + retrieval)
| Model | Dims | MTEB | Cost |
| text-embedding-3-large | 3072 | 64.6 | $0.13/1M |
| text-embedding-3-small | 1536 | 62.3 | $0.02/1M |
| Cohere embed-v3 | 1024 | 64.5 | $0.10/1M |
| Voyage voyage-3 | 1024 | 67.1 | $0.06/1M |
| BGE-large-en-v1.5 | 1024 | 64.2 | Free* |
IF budget-constrained: text-embedding-3-small. IF quality-critical: Voyage or BGE fine-tuned. IF multi-language: Cohere embed-v3.
| Strategy | Best For |
| Fixed-size (token) | Baseline, uniform docs |
| Recursive character | General-purpose |
| Semantic | Varying topic density |
| Code-aware (AST) | Source code repos |
| Markdown headers | Structured docs |
| Sliding window | Boundary context critical |
ALWAYS set overlap >= 10% of chunk size. IF chunk_size > 1000 tokens: information dilution risk. IF chunk_size < 100 tokens: context too fragmented. Default: 500 tokens, 50 token overlap.
| Store | Type | Scale | Best For |
| Pinecone | Managed | Billions | Production |
| Weaviate | Managed/Self | Millions | Hybrid search |
| Chroma | Embedded | Millions | Prototyping |
| pgvector | Extension | Millions | Existing PG |
| Qdrant | Managed/Self | Billions | High perf |
IF already using PostgreSQL: start with pgvector. IF < 100K chunks: Chroma for development. IF > 10M chunks: Pinecone or Qdrant.
Document -> Parse/Extract -> Clean/Transform
-> Chunk -> Embed -> Index
Loaders:
PDF: PyMuPDF, pdfplumber, Unstructured
HTML: BeautifulSoup, Unstructured
Code: tree-sitter AST parser
# Verify indexing
python -c "from chromadb import Client; \
c=Client(); print(c.list_collections())"
Hybrid search (RECOMMENDED for production):
Dense (vector): semantic similarity
Sparse (BM25): keyword/exact matching
Fusion: Reciprocal Rank Fusion (RRF)
Top-K: 5-20 chunks (start with 10)
Reranker: cross-encoder on top-20 results
(highest-impact single optimization)
IF recall < 70%: increase overlap, add BM25, try domain-specific embeddings. IF recall > 90% but bad answers: generation problem.
Context window budget:
System prompt: <N tokens>
Retrieved context: <N tokens>
Conversation history: <N tokens>
Output reservation: <N tokens>
Total < model context limit
Assembly: rank by relevance, include until budget.
Format with source attribution.
Retrieval metrics:
Hit rate @ K: % queries with answer in top-K
MRR: average 1/rank of first correct result
Generation metrics:
Faithfulness: grounded in retrieved context
Hallucination rate: answers without evidence
Targets:
Recall@10 >= 80%, MRR >= 0.7
Faithfulness >= 90%, Hallucination < 5%
# RAG pipeline testing
python -m pytest tests/test_rag.py -v
curl -s http://localhost:8080/api/search?q=test | jq .results
Append .godmode/rag.tsv:
timestamp action chunks recall_at_10 faithfulness hallucination status
KEEP if: target metric improved AND hallucination
did not increase.
DISCARD if: hallucination increased OR no improvement.
Never keep a change that increases hallucination.
STOP when FIRST of:
- Recall@10 >= 80%, faithfulness >= 90%,
hallucination < 5%
- Two iterations < 2% improvement
- Latency meets requirements
On failure: git reset --hard HEAD~1. Never pause.
| Failure | Action |
|---|---|
| Low recall < 70% | Increase overlap, add BM25, reranker |
| High hallucination | Add "only use context", reduce chunks |
| High latency | Cache frequent queries, reduce top-K |