From antigravity-awesome-skills
Forensic root cause analyzer for Antigravity sessions. Classifies scope deltas, rework patterns, root causes, hotspots, and auto-improves prompts/health.
npx claudepluginhub mit-network/antigravity-awesome-skillsThis skill uses the workspace's default tool permissions.
Analyze AI-assisted coding sessions in `~/.gemini/antigravity/brain/` and produce a report that explains not just **what happened**, but **why it happened**, **who/what caused it**, and **what should change next time**.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Analyze AI-assisted coding sessions in ~/.gemini/antigravity/brain/ and produce a report that explains not just what happened, but why it happened, who/what caused it, and what should change next time.
For each session, determine:
.resolved.N counts as iteration signals, not proof of failureClassify the primary session intent from objective + artifacts:
DELIVERYDEBUGGINGREFACTORRESEARCHEXPLORATIONAUDIT_ANALYSISRecord:
session_intentsession_intent_confidenceUse intent to contextualize severity and rework shape. Do not judge exploratory or research sessions by the same standards as narrow delivery sessions.
brain/ directoryconversation_idtitleobjectivecreatedlast_modifiedOutput: indexed list of conversations to analyze.
For each conversation, read if present:
task.mdimplementation_plan.mdwalkthrough.md*.metadata.jsontask.md.resolved.0 ... Nimplementation_plan.md.resolved.0 ... Nwalkthrough.md.resolved.0 ... N.md artifactsRecord per conversation:
has_taskhas_planhas_walkthroughis_completedis_abandoned_candidate = task exists but no walkthroughtask_versionsplan_versionswalkthrough_versionsextra_artifactstask_items_initialtask_items_finaltask_completed_pctscope_delta_rawscope_creep_pct_rawcreated_atcompleted_atduration_minutesobjective_textinitial_plan_summaryfinal_plan_summaryinitial_task_excerptfinal_task_excerptwalkthrough_summarymentioned_files_or_subsystemsvalidation_requirements_presentacceptance_criteria_presentnon_goals_presentscope_boundaries_presentfile_targets_presentconstraints_presentScore the opening request on a 0–2 scale for:
Create:
prompt_sufficiency_scoreprompt_sufficiency_band = High / Medium / LowThen note which missing prompt ingredients likely contributed to later friction.
Do not punish short prompts by default; a narrow, obvious task can still have high sufficiency.
Classify scope change into:
Record:
scope_change_type_primaryscope_change_type_secondary (optional)scope_change_confidenceKeep one short example in mind for calibration:
Classify each session into one primary pattern:
Record:
rework_shaperework_shape_confidenceFor every non-clean session, assign:
One of:
SPEC_AMBIGUITYHUMAN_SCOPE_CHANGEREPO_FRAGILITYAGENT_ARCHITECTURAL_ERRORVERIFICATION_CHURNLEGITIMATE_TASK_COMPLEXITYOptional if materially relevant
Every root-cause assignment must include:
Assign each session a severity score to prioritize attention.
Components (sum, clamp 0–100):
abandoned = 25)low = 10)REPO_FRAGILITY / AGENT_ARCHITECTURAL_ERROR highest)Bands:
Record:
session_severity_scoreseverity_bandseverity_drivers = top 2–4 contributorsseverity_confidenceUse severity as a prioritization signal, not a verdict. Always explain the drivers. Contextualize severity using session intent so research/exploration sessions are not over-penalized.
Across all conversations, cluster repeated struggle by file, folder, or subsystem.
For each cluster, calculate:
Goal: identify whether friction is mostly prompt-driven, agent-driven, or concentrated in specific repo areas.
Compare:
For each comparison, identify:
Do not just restate averages; extract cautious evidence-backed patterns.
Generate 3–7 findings that are not simple metric restatements.
Each finding must include:
Examples of strong findings:
Create session_analysis_report.md with this structure:
Generated: [timestamp]
Conversations Analyzed: [N]
Date Range: [earliest] → [latest]
| Metric | Value | Rating |
|---|---|---|
| First-Shot Success Rate | X% | 🟢/🟡/🔴 |
| Completion Rate | X% | 🟢/🟡/🔴 |
| Avg Scope Growth | X% | 🟢/🟡/🔴 |
| Replan Rate | X% | 🟢/🟡/🔴 |
| Median Duration | Xm | — |
| Avg Session Severity | X | 🟢/🟡/🔴 |
| High-Severity Sessions | X / N | 🟢/🟡/🔴 |
Thresholds:
Avg severity guidance:
Note: avg severity is an aggregate health signal, not the same as per-session severity bands.
Then add a short narrative summary of what is going well, what is breaking down, and whether the main issue is prompt quality, repo fragility, workflow discipline, or validation churn.
| Root Cause | Count | % | Notes |
|---|
Separate:
Summarize the main failure patterns across sessions.
Show the files/folders/subsystems most associated with replanning, abandonment, verification churn, and high severity.
List the cleanest sessions and extract what made them work.
List 3–7 evidence-backed findings with confidence.
List the highest-severity sessions and say whether the best intervention is:
For each recommendation, use:
| # | Title | Intent | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Severity | Complete? |
|---|
If appropriate, also:
prompt_improvement_tips.md from high-sufficiency / first-shot-success sessionsOnly recommend workflows/skills when the pattern appears repeatedly.
The workflow must produce:
Prefer explicit uncertainty over fake precision.