From antigravity-awesome-skills
Forensic root cause analyzer for Antigravity sessions. Classifies scope deltas, rework patterns, root causes, hotspots, and auto-improves prompts/health.
npx claudepluginhub absjaded/antigravity-awesome-skillsThis skill uses the workspace's default tool permissions.
Analyze AI-assisted coding sessions in `brain/` and produce a diagnostic report that explains not just **what happened**, but **why it happened**, **who/what caused it**, and **what should change next time**.
Verifies tests pass on completed feature branch, presents options to merge locally, create GitHub PR, keep as-is or discard; executes choice and cleans up worktree.
Guides root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Writes implementation plans from specs for multi-step tasks, mapping files and breaking into TDD bite-sized steps before coding.
Analyze AI-assisted coding sessions in brain/ and produce a diagnostic report that explains not just what happened, but why it happened, who/what caused it, and what should change next time.
This workflow is not a simple metrics dashboard. It is a forensic analysis workflow for AI coding sessions.
For each session, determine:
.resolved.N counts as signals of iteration intensity, not proof of failureconversation_idtitleobjectivecreatedlast_modifiedOutput: indexed list of conversations to analyze.
For each conversation, read all structured artifacts that exist.
task.mdimplementation_plan.mdwalkthrough.md*.metadata.jsontask.md.resolved.0 ... Nimplementation_plan.md.resolved.0 ... Nwalkthrough.md.resolved.0 ... N.md artifactshas_taskhas_planhas_walkthroughis_completedis_abandoned_candidate = has task but no walkthroughtask_versionsplan_versionswalkthrough_versionsextra_artifactstask_items_initialtask_items_finaltask_completed_pctscope_delta_rawscope_creep_pct_rawcreated_atcompleted_atduration_minutesobjective_textinitial_plan_summaryfinal_plan_summaryinitial_task_excerptfinal_task_excerptwalkthrough_summarymentioned_files_or_subsystemsvalidation_requirements_presentacceptance_criteria_presentnon_goals_presentscope_boundaries_presentfile_targets_presentconstraints_presentFor each conversation, score the opening objective/request on a 0–2 scale for each dimension:
Create:
prompt_sufficiency_scoreprompt_sufficiency_band = High / Medium / LowThen note which missing ingredients likely contributed to later friction.
Important: Do not assume a low-detail prompt is bad by default. Short prompts can still be good if the task is narrow and the repo context is obvious.
Do not treat all scope growth as the same.
For each conversation, classify scope delta into:
New items clearly introduced beyond the initial ask. Examples:
Work that was not in the opening ask but appears required to complete it correctly. Examples:
Work that appears not requested and not necessary, likely introduced by agent overreach.
For each conversation record:
scope_change_type_primaryscope_change_type_secondary (optional)scope_change_confidenceDo not just count revisions. Determine the shape of session rework.
Classify each conversation into one of these patterns:
Record:
rework_shaperework_shape_confidenceFor every non-clean session, assign:
Choose one:
SPEC_AMBIGUITYHUMAN_SCOPE_CHANGEREPO_FRAGILITYAGENT_ARCHITECTURAL_ERRORVERIFICATION_CHURNLEGITIMATE_TASK_COMPLEXITYOptional if a second factor materially contributed.
Every root cause assignment must include:
Use when the opening ask lacked boundaries, targets, criteria, or constraints, and the plan had to invent them.
Use when the task set expanded due to new asks, broadened goals, or post-hoc additions.
Use when hidden coupling, unclear architecture, brittle files, or environmental issues forced extra work.
Use when the agent chose the wrong approach, wrong files, wrong assumptions, or hallucinated structure.
Use when implementation mostly succeeded but tests, validation, QA, or fixes created repeated loops.
Use when revisions were reasonable given the difficulty and do not strongly indicate avoidable failure.
Across all conversations, cluster repeated struggle by subsystem, folder, or file mentions.
Examples:
frontend/auth/*db.pyui.pyvideo_pipeline/*For each cluster, calculate:
Output the top recurring friction zones.
Goal: Identify whether struggle is prompt-driven, agent-driven, or concentrated in specific repo areas.
Compare these cohorts:
For each comparison, identify:
Do not merely restate averages. Extract causal-looking patterns cautiously and label them as inference where appropriate.
Generate 3–7 findings that are not simple metric restatements.
Good examples:
Bad examples:
Each finding must include:
Create session_analysis_report.md in the current conversation’s brain folder.
Use this structure:
Generated: [timestamp] Conversations Analyzed: [N] Date Range: [earliest] → [latest]
| Metric | Value | Rating |
|---|---|---|
| First-Shot Success Rate | X% | 🟢/🟡/🔴 |
| Completion Rate | X% | 🟢/🟡/🔴 |
| Avg Scope Growth | X% | 🟢/🟡/🔴 |
| Replan Rate | X% | 🟢/🟡/🔴 |
| Median Duration | Xm | — |
| Avg Revision Intensity | X | 🟢/🟡/🔴 |
Then include a short narrative summary:
| Root Cause | Count | % | Notes |
|---|---|---|---|
| Spec Ambiguity | X | X% | ... |
| Human Scope Change | X | X% | ... |
| Repo Fragility | X | X% | ... |
| Agent Architectural Error | X | X% | ... |
| Verification Churn | X | X% | ... |
| Legitimate Task Complexity | X | X% | ... |
Separate:
Show top offenders in each category.
Summarize how sessions tend to fail:
Cluster repeated struggle by subsystem/file/domain. Show which areas correlate with:
List the cleanest sessions and extract what made them work:
List 3–7 high-value findings with evidence and confidence.
Each recommendation must use this format:
Recommendations must be specific, not generic.
| # | Title | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Complete? |
|---|
Add short notes only where meaningful.
~/.gemini/antigravity/.agent/skills/project-health-state/SKILL.mdUpdate:
Create prompt_improvement_tips.md
Do not give generic advice. Instead extract:
If multiple struggle sessions cluster around the same subsystem or repeated sequence, recommend:
Only recommend workflows when the pattern appears repeatedly.
The workflow must produce:
If evidence is weak, say so. Do not overclaim. Prefer explicit uncertainty over fake precision.
How to invoke this skill
Just say any of these in a new conversation:
The agent will automatically discover and use the skill.