Trigger with "show me the improvement chart", "how are we improving", "progress report", "graph the eval scores", "show cycle of improvement", "what's the trend", "are we getting better". Produces a visual/text summary of how the agentic loop is improving across cycles. Do NOT use this to run the learning loop or evaluate a specific skill change.
From agent-agentic-osnpx claudepluginhub richfrem/agent-plugins-skills --plugin agent-agentic-osThis skill is limited to using the following tools:
evals/evals.jsonevals/results.tsvimprovement-ledger-spec.mdpost_run_survey.mdreferences/chart-reading-guide.mdreferences/memory/improvement-ledger-spec.mdreferences/memory/post_run_survey.mdreferences/operations/chart-reading-guide.mdreferences/testing/test-scenarios-seed.mdrequirements.inrequirements.txtscripts/analysis.ipynbscripts/generate_report.pytest-scenarios-seed.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Verifies tests pass on completed feature branch, presents options to merge locally, create GitHub PR, keep as-is or discard; executes choice and cleans up worktree.
This skill requires Python 3.8+, pandas, and matplotlib.
To install this skill's dependencies:
pip-compile ./requirements.in
pip install -r ./requirements.txt
See ./requirements.txt for the dependency lockfile.
Visual and text reporting on the agentic loop improvement cycle — across any plugin that
maintains an improvement-ledger.md and results.tsv per skill.
The reference output is the autoresearch progress chart: green KEEP dots on a timeline, gray DISCARD dots, running-best step line, annotations showing what each improvement was. This skill produces the same chart for agentic-os and exploration-cycle-plugin improvement cycles.
| Source | Content |
|---|---|
context/memory/improvement-ledger.md | Eval score progression (Section 1), survey-to-action trace (Section 2), north star metric (Section 3) |
.agents/skills/*/evals/results.tsv | Per-skill detailed eval score history (supplement to ledger) |
The improvement ledger is the primary source. It is written at every loop close (Stage 4.7
of os-improvement-loop). See references/memory/improvement-ledger-spec.md for the format.
| Output | Description |
|---|---|
context/memory/reports/progress_YYYYMMDD_HHMM.png | Progress chart: KEEP/DISCARD timeline, running-best step line, change annotations |
context/memory/reports/summary_YYYYMMDD_HHMM.md | Text summary: baseline vs best, top hits by delta, survey effectiveness, north star trend |
LEDGER="${CLAUDE_PROJECT_DIR}/context/memory/improvement-ledger.md"
if [ ! -f "$LEDGER" ]; then
echo "No improvement ledger found. Run at least one full loop cycle first."
echo "The ledger is created at Stage 4.7 of os-improvement-loop."
exit 0
fi
wc -l "$LEDGER"
If the ledger exists but Section 1 table is empty (no rows beyond the header), inform the user that no cycles have been completed yet and the first loop run will establish the baseline. Do not run the report script on an empty ledger — it will produce an empty chart.
PLUGIN_DIR="${CLAUDE_PLUGIN_ROOT:-$(pwd)/.agents/skills/agent-agentic-os}"
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$(pwd)}"
python3 "${PLUGIN_DIR}/skills/os-improvement-report/scripts/generate_report.py" \
--project-dir "$PROJECT_DIR" \
--plugin-dir "$PLUGIN_DIR" \
[--skill SESSION-MEMORY-MANAGER] # optional: filter to one skill
The script exits 0 on success and prints the chart path and text summary to stdout.
After the script completes:
context/memory/reports/progress_[TIMESTAMP].pngIf the user wants improvement tracking across both agent-agentic-os AND exploration-cycle-plugin,
run the report twice — once per plugin — passing each plugin's project dir:
# agentic-os cycles
python3 "$SCRIPT" --project-dir "$AGENTIC_OS_PROJECT" --plugin-dir "$AGENTIC_OS_PLUGIN"
# exploration-cycle cycles
python3 "$SCRIPT" --project-dir "$EXPLORATION_PROJECT" --plugin-dir "$EXPLORATION_PLUGIN"
Both plugins write to context/memory/improvement-ledger.md in their respective project dirs.
Each produces its own chart. The text summaries can be concatenated for a combined view.
The chart mirrors the autoresearch progress.png:
A flat or declining step line = the loop is not improving the skill. Frequent DISCARD clusters = hypothesis quality needs work (check test scenarios seed). Steep step-line rises = the survey-to-action trace is working.
Any plugin that runs eval cycles can plug into this report by:
context/memory/improvement-ledger.md with the three-section format
(see references/memory/improvement-ledger-spec.md — includes a bash init snippet).The generate_report.py script works on any ledger with this format — it is not
tied to agent-agentic-os specifically.