From devboy
Aggregates last N days of session traces, merge requests, and CI pipelines to surface patterns in skill success rates, review feedback, and flaky jobs. Use for weekly retros, quantifying skill failures, or pre-upgrade checks.
npx claudepluginhub meteora-pro/devboy-tools --plugin devboyThis skill uses the workspace's default tool permissions.
Looks back over the last N days of session traces, recent merge
Analyzes git commit history, work patterns, and code quality metrics to generate engineering retrospectives with per-person breakdowns, shipping streaks, trends, and actionable improvements.
Runs scheduled retrospective analysis on accumulated coding mistakes, identifies patterns, evaluates action item effectiveness, derives new items, and escalates ineffective fixes. Triggers on time or mistake count thresholds.
Performs post-pipeline retrospectives: parses logs, counts productive vs wasted iterations, identifies failure patterns, scores runs, suggests fixes to skills/scripts.
Share bugs, ideas, or general feedback.
Looks back over the last N days of session traces, recent merge requests, and CI pipelines to surface recurring patterns: skills whose success rate slipped, review feedback that keeps coming back, flaky jobs. The output is a user-facing report with suggestions — the skill never files tickets, never edits other skills, and never opens MRs.
--days 7 (default). Collect traces from the last N calendar days
under <scope>/.devboy/sessions/<YYYY-MM-DD>/ where <scope> is
either the repo root (default) or ~/.devboy/ when --global is
passed.result=$(devboy trace begin --skill retro)
SESSION_DIR=$(echo "$result" | jq -r .session_dir)
SESSION_ID=$(echo "$result" | jq -r .session_id)
Emit a decision event recording the window and the scope.
Walk every <date>/<skill>/<session_id>/meta.json in the window —
the trace subsystem nests each session one level below <skill>/.
Per skill, aggregate across all its session directories:
total runs, success / failure / aborted counts, total tool_calls,
total errors, total duration, average duration, and the most
common summary strings for failing runs.
Additionally, read each failing session's trace.jsonl to find
retry loops — sequences of verify events with ok: false followed
by more tool_call attempts. A skill with many retry loops is a
skill that could benefit from a stronger precondition check; record
the ratio retried / total_failures per skill.
Emit one note event per skill containing the aggregate numbers so
future retros have a stable trail.
For every merged MR in the window:
devboy tools call get_merge_requests '{"state":"merged","limit":100}'
Filter the result to merge timestamps inside the window, then for the first ~20 call:
devboy tools call get_pipeline \
'{"mrKey":"mr#482","includeFailedLogs":true}'
Collect failing-job frequency keyed by job name. For the top three
failing jobs, call get_job_logs in search mode to pull the most
common error signature:
devboy tools call get_job_logs \
'{"jobId":"<id>","pattern":"error|fail|panic","context":2,"maxMatches":10}'
Keep only the error shapes that repeat across multiple runs — a single broken job is signal for the developer, not a pattern.
For the same merged MRs:
devboy tools call get_merge_request_discussions \
'{"key":"mr#482","limit":50}'
Group the discussion bodies by naïve keyword bucket (type-safety, error-handling, testing, naming, i18n, performance, security). Count how often each bucket appears across MRs. The top three buckets go into the report.
Markdown to stdout:
# Retro — last 7 days
## Skills with degraded success rate
- solve-issue — 6/10 success (was 9/10 the previous week);
60% of failures retry more than twice; top summary:
"gitlab returned 429".
## Frequent review feedback
- testing (mentioned in 9 MRs)
- error-handling (mentioned in 5 MRs)
- type-safety (mentioned in 4 MRs)
## Flaky CI signal
- integration::auth — 5/20 runs failed with "connection refused"
- clippy — 3/20 runs failed with "-D warnings" on a single rule
## Suggestions
- Add a 429 back-off to the get_issues call inside solve-issue.
- Update review-mr's checklist to call out type-safety explicitly.
- Investigate integration::auth — likely a race on the test fixture.
Omit sections with no entries. Keep the report tight; two screens of text at most.
devboy trace end \
--session-dir "$SESSION_DIR" --session-id "$SESSION_ID" \
--skill retro \
--outcome "$OUTCOME" \
--summary "<N> sessions, <M> MRs, <K> jobs analysed"
SKILL.md, never post a
comment. The suggestions are text for a human to read.<redacted:credential>, <redacted:token-pattern>)
are treated as opaque. Count them, do not try to un-redact them.get_pipeline or get_merge_request_discussions fails, note
the degradation in the report ("CI section omitted — pipeline
lookup failed") rather than pretending everything is fine.daily-report — that is a single-day
summary, this one is a multi-day pattern detector.