From economist-agents
Track article quality metrics over time and alert on degradation. Use when adding a new metric to the quality dashboard, when investigating a quality trend, when configuring alert thresholds.
npx claudepluginhub oviney/economist-agentsThis skill uses the workspace's default tool permissions.
Reads from evaluation logs and produces a dashboard JSON after each pipeline run. Tracks article pass rates, failure modes, quality scores, and revision frequency so degradation is detected early and improvement is measurable.
Applies Acme Corporation brand guidelines including colors, fonts, layouts, and messaging to generated PowerPoint, Excel, and PDF documents.
Guides strict Test-Driven Development (TDD): write failing tests first for features, bugfixes, refactors before any production code. Enforces red-green-refactor cycle.
Share bugs, ideas, or general feedback.
Reads from evaluation logs and produces a dashboard JSON after each pipeline run. Tracks article pass rates, failure modes, quality scores, and revision frequency so degradation is detected early and improvement is measurable.
scripts/quality_dashboard.pyarticle-evaluationdefect-prevention1. Pipeline run completes (success or failure)
↓
2. Read evaluation logs from logs/article_evals.json
↓
3. Compute weekly aggregates: pass rate, failure modes, dimension scores
↓
4. Append to logs/quality_dashboard.json
↓
5. Check alert thresholds
↓
6. If threshold breached → create GitHub issue with severity
| Category | Metrics |
|---|---|
| Article pass rate | First-attempt publish %, publish-after-revision %, total fail % |
| Failure modes | Count per type, top 3 per week, trend direction |
| Quality scores | Avg total per week, per-dimension averages, min/max |
| Revision loops | Avg retries per article, distribution (0/1/2 revisions) |
| Metric | Warn | Critical |
|---|---|---|
| First-attempt publish rate | <70% | <50% |
| Avg eval score | <35/50 | <25/50 |
| Any dimension avg | <6/10 | <4/10 |
| Same failure mode 3+ times in a row | warn | critical if 5+ |
Output to logs/quality_dashboard.json — append-only, never delete historical data.
| Rationalization | Reality |
|---|---|
| "We can eyeball quality from the articles" | Degradation is gradual — you won't notice a 5% weekly decline until it's 30% down |
| "The eval scores are good enough" | Scores without trends are snapshots; only trends reveal whether you're improving or decaying |
| "Alerts are noisy, let's remove them" | Tune thresholds, don't remove alerting — silent failures are the most expensive |
| "We'll check the dashboard when something breaks" | By then you've shipped 3 bad articles; proactive monitoring catches the first one |
logs/quality_dashboard.json updated after each pipeline run — evidence: timestamp matches latest runlogs/article_evals.json — evidence: article count matches eval log entriesorjson for serialization, file-based storage onlylogs/article_evals.json — per-article evaluation scoreslogs/pipeline_runs.json — pipeline execution metadataeditorial-judge — post-deployment failurescontent-pipeline.ymlscripts/quality_dashboard.py — existing code-quality scoring (separate concern)data/skills_state/quality_history.json — historical quality scores (infrastructure)