From tea-rags-setup
Benchmarks hardware and tunes performance parameters for TeaRAGs (embedding throughput, Qdrant storage, pipeline concurrency, git trajectory). Useful when indexing is slow or to find optimal batch sizes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tea-rags-setup:tuneThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Runs `tea-rags tune` to find optimal hardware perf params, saves results to
Runs tea-rags tune to find optimal hardware perf params, saves results to
setup progress file for MCP config.
tea-rags installed (tea-rags --version works)Check args provided. If not, check progress file for saved values.
Provider: from arg --provider, or progress file embeddingProvider, or
detect from current MCP config. Default: ollama.
Full mode: from arg --full. Default: quick mode (~2-3 min).
Qdrant URL: omit --qdrant-url when possible — see section 1a. Use progress
file qdrantUrl only when real external URL.
Embedding URL: from progress file or default http://localhost:11434.
Why this matters. Install wizard runs tune at step 6, BEFORE MCP harness configured at step 8. At tune time:
~/.claude.json yet, so mcp__tea-rags__* tools
unavailable.setup-qdrant.sh embedded only
downloaded binary.--qdrant-url http://localhost:6333 fails with connection error.What to do. Omit --qdrant-url. tea-rags tune CLI handles full cascade
internally:
http://localhost:6333 — uses it if Docker/native Qdrant answers.~/.tea-rags/qdrant/ (downloads binary
first if needed), reads random port from daemon.port, targets
http://127.0.0.1:<port> for benchmark.When to pass --qdrant-url explicitly. Only if qdrantMode is docker or
native and progress file qdrantUrl is real http URL (not literal string
"embedded"). For embedded mode progress file value is "embedded" — a marker,
not a URL, must NOT be passed on command line.
Sanity check before invoking tune in embedded mode:
# Confirm the embedded binary is present — tune relies on it.
test -x "$HOME/.tea-rags/qdrant/bin/qdrant" || echo "Embedded binary missing — re-run setup-qdrant.sh embedded"
Execute in background (2-3 min quick mode, 10-15 full):
tea-rags tune \
--provider <provider> \
[--qdrant-url <url>] \ # OMIT for embedded mode (tune auto-spawns daemon)
--embedding-url <url> \
[--full]
Show the user: "Running performance benchmark (~2-3 min). This tests embedding throughput, Qdrant storage speed, and pipeline concurrency."
Do NOT run in a background agent — output useful for user to see progress real time. Run foreground via Bash tool with 600000ms timeout.
After tune completes, read tuned_environment_variables.env from project root
(or current directory).
Extract these values:
| Variable | Description |
|---|---|
EMBEDDING_BATCH_SIZE | Optimal embedding batch size |
EMBEDDING_CONCURRENCY | Optimal embedding concurrency |
QDRANT_UPSERT_BATCH_SIZE | Optimal Qdrant batch size |
QDRANT_BATCH_ORDERING | Optimal ordering mode (weak/medium/strong) |
QDRANT_FLUSH_INTERVAL_MS | Optimal flush interval |
BATCH_FORMATION_TIMEOUT_MS | Optimal batch formation timeout |
QDRANT_DELETE_BATCH_SIZE | Optimal delete batch size |
QDRANT_DELETE_CONCURRENCY | Optimal delete concurrency |
INGEST_TUNE_CHUNKER_POOL_SIZE | Optimal chunker pool size |
INGEST_TUNE_FILE_CONCURRENCY | Optimal file concurrency |
INGEST_TUNE_IO_CONCURRENCY | Optimal IO concurrency |
QDRANT_TUNE_DELETE_FLUSH_TIMEOUT_MS | Optimal delete flush timeout |
EMBEDDING_TUNE_MIN_BATCH_SIZE | Optimal min batch size |
TRAJECTORY_GIT_CHUNK_CONCURRENCY | Optimal git chunk concurrency |
Also extract perf metrics from comments:
Embedding rate: N chunks/sStorage rate: N chunks/sDeletion rate: N del/sUse progress script to save tuned values:
SCRIPTS="${CLAUDE_PLUGIN_ROOT}/scripts/setup/unix" # or windows/
$SCRIPTS/progress.sh set tuneValues '{"EMBEDDING_BATCH_SIZE":"256",...}'
$SCRIPTS/progress.sh set steps.tune '{"status":"completed","at":"<now>"}'
If progress file missing, create it first:
$SCRIPTS/progress.sh init
Display results to user:
Performance tuning complete!
Embedding: BATCH_SIZE=256, CONCURRENCY=4
Throughput: 1200 chunks/sec
Qdrant: UPSERT_BATCH_SIZE=384, ORDERING=weak
FLUSH_INTERVAL=100ms, FORMATION_TIMEOUT=2000ms
Storage: 3500 chunks/sec
Pipeline: CHUNKER_POOL=4, FILE_CONC=50, IO_CONC=50
Estimated indexing times:
Small project (50K LoC): ~30s
Medium project (200K LoC): ~2min
Large project (1M LoC): ~10min
Results saved to ~/.tea-rags/setup-progress.json
Use /tea-rags-setup:install to apply these values to your MCP config.
Delete tuned_environment_variables.env after parsing — values now in progress
file.
ONNX tune not yet fully supported. When provider is onnx:
/tea-rags-setup:install~/.tea-rags/qdrant/bin/qdrant and re-run setup-qdrant.sh embedded.Cannot connect to Qdrant at http://localhost:6333 in embedded mode: you
passed --qdrant-url explicitly with literal string "embedded" or
http://localhost:6333. Re-run tune WITHOUT --qdrant-url — see section 1a.npx claudepluginhub artk0de/tearags-mcp --plugin tea-rags-setupAutomated install wizard for TeaRAGs MCP server. Detects environment, installs Node.js, tea-rags, Ollama/ONNX, Qdrant, tunes performance, configures MCP server. Resumable via progress file.
Orchestrates online benchmarks for vLLM inference services using `vllm bench serve`. Supports single/multi-case batch execution with result aggregation and auto-optimization for throughput under latency SLOs (TTFT, TPOT, P99).