From agentic-qe-fleet
Displays AQE inference cost analysis comparing local (ruvllm) and cloud (anthropic, openai, openrouter) providers with savings estimates. Supports --period, --provider, --format, --detailed, --reset options.
npx claudepluginhub proffesor-for-testing/agentic-qe --plugin agentic-qe-fleet--- name: aqe-costs description: Display inference cost analysis and savings from local vs cloud providers --- # AQE Inference Costs Display comprehensive inference cost analysis showing local vs cloud inference costs and estimated savings. ## Usage ## Options | Option | Type | Default | Description | |--------|------|---------|-------------| | `--period` | string | `24h` | Time period: 1h, 24h, 7d, 30d, all | | `--provider` | string | - | Filter by provider: ruvllm, anthropic, openrouter, openai | | `--format` | string | `text` | Output format: text, json | | `--...
/cost-trackerDisplays current session cost estimate, top cost drivers, optimization suggestions, and budget guidance for task types.
/costsDisplays cost breakdown by provider (tokens in/out, queries, est. cost) and by workflow for the current session and cumulative history.
/hatch3r-cost-trackingTracks token usage and estimates costs across agent workflows, board operations, and sessions. Reports breakdowns, trends, and enforces budget guardrails.
Share bugs, ideas, or general feedback.
Display comprehensive inference cost analysis showing local vs cloud inference costs and estimated savings.
aqe costs [options]
# or
/aqe-costs [options]
| Option | Type | Default | Description |
|---|---|---|---|
--period | string | 24h | Time period: 1h, 24h, 7d, 30d, all |
--provider | string | - | Filter by provider: ruvllm, anthropic, openrouter, openai |
--format | string | text | Output format: text, json |
--detailed | boolean | false | Show detailed per-request breakdown |
--reset | boolean | false | Reset cost tracking data |
aqe costs
Displays cost summary for the last 24 hours with savings analysis.
aqe costs --period 7d
Shows cost trends and savings over the past 7 days.
aqe costs --provider ruvllm
Displays costs for local ruvllm inference only.
aqe costs --detailed
Shows per-request cost breakdown with agent and task attribution.
aqe costs --format json > costs.json
Exports cost data in JSON format for integration with monitoring dashboards.
aqe costs --reset
Clears all tracked cost data (useful for testing or new billing periods).
// Use Claude Code's Task tool for cost monitoring
Task("Monitor inference costs", `
Analyze AQE inference costs and provide recommendations:
- Check cost trends over the past 24 hours
- Identify high-cost agents or tasks
- Calculate savings from local inference
- Recommend optimizations to reduce cloud costs
Store findings in memory: aqe/costs/analysis/{timestamp}
`, "qe-quality-gate")
// Daily cost report generation
[Single Message]:
Task("Generate cost report", "Create daily inference cost summary", "qe-quality-gate")
Task("Analyze cost trends", "Identify cost optimization opportunities", "qe-quality-gate")
TodoWrite({ todos: [
{content: "Fetch cost data from tracker", status: "in_progress", activeForm: "Fetching data"},
{content: "Calculate savings metrics", status: "in_progress", activeForm: "Calculating savings"},
{content: "Generate recommendations", status: "pending", activeForm: "Generating recommendations"},
{content: "Store report in memory", status: "pending", activeForm: "Storing report"}
]})
Inference Cost Report
====================
Period: 2025-12-15T00:00:00Z to 2025-12-15T23:59:59Z
Overall Metrics:
Total Requests: 1,248
Total Tokens: 3,456,789
Total Cost: $5.2340
Requests/Hour: 52.0
Cost/Hour: $0.2181
Cost Savings Analysis:
Actual Cost: $5.2340
Cloud Baseline Cost: $18.7650
Total Savings: $13.5310 (72.1%)
Local Requests: 892 (71.5%)
Cloud Requests: 356 (28.5%)
By Provider:
๐ ruvllm:
Requests: 892
Tokens: 2,234,567
Cost: $0.0000
Avg Cost/Request: $0.000000
Top Model: meta-llama/llama-3.1-8b-instruct
โ๏ธ anthropic:
Requests: 245
Tokens: 891,234
Cost: $4.5678
Avg Cost/Request: $0.018644
Top Model: claude-sonnet-4-6
โ๏ธ openrouter:
Requests: 111
Tokens: 330,988
Cost: $0.6662
Avg Cost/Request: $0.006002
Top Model: meta-llama/llama-3.1-70b-instruct
{
"timestamp": "2025-12-15T23:59:59Z",
"period": {
"start": "2025-12-15T00:00:00Z",
"end": "2025-12-15T23:59:59Z"
},
"overall": {
"totalRequests": 1248,
"totalTokens": 3456789,
"totalCost": 5.234,
"requestsPerHour": 52.0,
"costPerHour": 0.2181
},
"savings": {
"actualCost": 5.234,
"cloudBaselineCost": 18.765,
"totalSavings": 13.531,
"savingsPercentage": 72.1,
"localRequestPercentage": 71.5,
"cloudRequestPercentage": 28.5,
"localRequests": 892,
"cloudRequests": 356,
"totalRequests": 1248
},
"byProvider": {
"ruvllm": {
"provider": "ruvllm",
"providerType": "local",
"requestCount": 892,
"inputTokens": 1489711,
"outputTokens": 744856,
"totalTokens": 2234567,
"totalCost": 0,
"avgCostPerRequest": 0,
"topModel": "meta-llama/llama-3.1-8b-instruct",
"modelCounts": {
"meta-llama/llama-3.1-8b-instruct": 892
}
},
"anthropic": {
"provider": "anthropic",
"providerType": "cloud",
"requestCount": 245,
"inputTokens": 594156,
"outputTokens": 297078,
"totalTokens": 891234,
"totalCost": 4.5678,
"avgCostPerRequest": 0.018644,
"topModel": "claude-sonnet-4-6",
"modelCounts": {
"claude-sonnet-4-6": 187,
"claude-haiku-4-5-20251001": 58
}
},
"openrouter": {
"provider": "openrouter",
"providerType": "cloud",
"requestCount": 111,
"inputTokens": 220659,
"outputTokens": 110329,
"totalTokens": 330988,
"totalCost": 0.6662,
"avgCostPerRequest": 0.006002,
"topModel": "meta-llama/llama-3.1-70b-instruct",
"modelCounts": {
"meta-llama/llama-3.1-70b-instruct": 111
}
}
}
}
Inference Cost Report (Detailed)
================================
Period: 2025-12-15T00:00:00Z to 2025-12-15T23:59:59Z
Recent Requests (Last 20):
[2025-12-15T23:58:45Z] ruvllm/meta-llama/llama-3.1-8b-instruct
Agent: qe-test-generator
Tokens: 1,234 input / 567 output = 1,801 total
Cost: $0.0000
[2025-12-15T23:57:23Z] anthropic/claude-sonnet-4-6
Agent: qe-quality-gate
Task: quality-check-456
Tokens: 3,456 input / 1,789 output = 5,245 total
Cost: $0.0372
[2025-12-15T23:56:12Z] ruvllm/meta-llama/llama-3.1-8b-instruct
Agent: qe-test-executor
Task: test-run-789
Tokens: 876 input / 432 output = 1,308 total
Cost: $0.0000
... (17 more)
Provider Summary:
๐ Local (ruvllm, onnx): 892 requests (71.5%)
โ๏ธ Cloud (anthropic, openrouter, openai): 356 requests (28.5%)
Cost Optimization Recommendations:
โ Excellent local inference usage (71.5%)
โ Saving $13.53 per day vs full cloud inference
๐ก Consider migrating more quality-gate checks to local inference
๐ก Estimated monthly savings: $405.93
# Retrieve stored cost data
npx claude-flow@alpha memory retrieve --key "aqe/costs/tracker-data"
# Retrieve previous cost reports
npx claude-flow@alpha memory retrieve --key "aqe/costs/reports/latest"
# Store cost report
npx claude-flow@alpha memory store \
--key "aqe/costs/reports/${timestamp}" \
--value '{"totalCost": 5.234, "savings": 13.531}'
# Store cost optimization recommendations
npx claude-flow@alpha memory store \
--key "aqe/costs/recommendations" \
--value '[{"action": "migrate-to-local", "potentialSavings": 13.53}]'
import { getInferenceCostTracker } from 'agentic-qe/core/metrics';
const tracker = getInferenceCostTracker();
// Track local inference (free)
tracker.trackRequest({
provider: 'ruvllm',
model: 'meta-llama/llama-3.1-8b-instruct',
tokens: {
inputTokens: 1000,
outputTokens: 500,
totalTokens: 1500,
},
agentId: 'qe-test-generator',
taskId: 'task-123',
});
// Track cloud inference
tracker.trackRequest({
provider: 'anthropic',
model: 'claude-sonnet-4-6',
tokens: {
inputTokens: 2000,
outputTokens: 1000,
totalTokens: 3000,
},
agentId: 'qe-quality-gate',
});
import { getInferenceCostTracker, formatCostReport } from 'agentic-qe/core/metrics';
const tracker = getInferenceCostTracker();
// Get report for last 24 hours
const report = tracker.getCostReport();
// Format as text
const textReport = formatCostReport(report);
console.log(textReport);
// Get savings
console.log(`Total savings: $${report.savings.totalSavings.toFixed(2)}`);
console.log(`Savings rate: ${report.savings.savingsPercentage.toFixed(1)}%`);
Route routine tasks to local inference:
Potential Savings: Up to 90% cost reduction
Reserve cloud inference for:
Balance: Quality vs Cost
Implement fallback strategy:
// Try local first, fallback to cloud if needed
async function generateTests(spec) {
try {
return await localInference(spec);
} catch (err) {
return await cloudInference(spec);
}
}
Result: Optimal cost-quality balance
Regular cost reviews:
# Weekly review
aqe costs --period 7d --detailed
# Identify high-cost agents
# Migrate eligible workloads to local
Target: >70% local inference ratio
Anthropic Claude Sonnet 4.5 (January 2025):
OpenRouter (99% savings vs Claude):
OpenAI GPT-4 Turbo:
aqe costs
# Quick check of daily costs and savings
aqe costs --period 30d --format json > monthly-costs.json
# Export for finance review
aqe costs --detailed
# Identify high-cost agents and tasks for optimization
aqe costs --period 1h --format json
# Track costs per CI/CD pipeline run
โ ๏ธ No inference requests tracked in the specified period.
Use 'aqe costs --period all' to see all-time data.
Solution: Inference tracking may need to be enabled.
โ Error: Invalid period '5y'
Valid periods: 1h, 24h, 7d, 30d, all
Solution: Use a supported time period.
โ ๏ธ Warning: No requests found for provider 'unknown'
Available providers: ruvllm, anthropic, openrouter, openai, onnx
Solution: Check provider name spelling.
/aqe-fleet-status - View agent status with cost attribution/aqe-execute - Track execution costs/aqe-generate - Track generation costs/aqe-report - Include cost analysis in quality reports/aqe-fleet-status - Fleet health and status/aqe-report - Quality reports/aqe-benchmark - Performance benchmarking