npx claudepluginhub frank-luongt/faos-skills-marketplace --plugin faos-analystThis skill uses the workspace's default tool permissions.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Analyzes BMad project state from catalog CSV, configs, artifacts, and query to recommend next skills or answer questions. Useful for help requests, 'what next', or starting BMad.
Analyze experiment results with statistical rigor and produce a clear Ship / Investigate / Extend / Stop recommendation.
This skill complements ab-test-setup (which handles experiment design). Use this skill when you have results to analyze.
Most A/B test interpretations are wrong — teams either call tests too early, ignore guardrail metrics, or ship on directional trends without statistical significance. This skill enforces disciplined analysis.
ab-test-setup)| Field | Description |
|---|---|
| Primary metric | What the test is trying to improve (e.g., conversion rate) |
| Control group | Sample size (N) and conversions (C) for the control |
| Variant group | Sample size (N) and conversions (C) for the variant |
| Test duration | How long the test ran |
| Planned duration | How long it was designed to run |
| Guardrail metrics | Metrics that must not degrade (e.g., revenue, page load time) |
| MDE | Minimum Detectable Effect used in power calculation |
Before analyzing results, check:
If any check fails, the test results may be unreliable. Flag this before proceeding.
For conversion rate tests:
Control conversion rate: p_c = C_control / N_control
Variant conversion rate: p_v = C_variant / N_variant
Relative lift: (p_v - p_c) / p_c × 100%
Pooled proportion: p = (C_control + C_variant) / (N_control + N_variant)
Standard error: SE = sqrt(p × (1-p) × (1/N_control + 1/N_variant))
Z-score: Z = (p_v - p_c) / SE
P-value: two-tailed from Z
95% Confidence Interval: (p_v - p_c) ± 1.96 × SE
| Criterion | Threshold | Status |
|---|---|---|
| Statistical significance | p-value < 0.05 | Pass / Fail |
| Practical significance | Lift > MDE | Pass / Fail |
| Confidence interval | Does CI exclude 0? | Pass / Fail |
Both statistical AND practical significance are required to ship.
For each guardrail metric:
| Guardrail | Control | Variant | Change | Status |
|---|---|---|---|---|
| [metric name] | [value] | [value] | [+/- %] | OK / Warning / Degraded |
A guardrail is degraded if it shows a statistically significant negative change.
Use this decision matrix:
| Primary Metric | Guardrails | Recommendation |
|---|---|---|
| Significant positive | All OK | Ship — roll out to 100% |
| Significant positive | Some degraded | Investigate — understand trade-off before deciding |
| Not significant, positive trend | All OK | Extend — run longer if sample size was insufficient |
| Not significant, flat | All OK | Stop — no effect detected, free up the experiment slot |
| Significant negative | Any | Don't Ship — revert and learn from the result |
# A/B Test Results: [Test Name]
## Summary
| Field | Value |
| --- | --- |
| Test name | [name] |
| Hypothesis | [We believed X would cause Y] |
| Primary metric | [metric name] |
| Duration | [start] — [end] ([N] days) |
| Decision | **Ship / Investigate / Extend / Stop / Don't Ship** |
---
## Results
| Group | Sample Size | Conversions | Rate |
| --- | --- | --- | --- |
| Control | [N] | [C] | [rate]% |
| Variant | [N] | [C] | [rate]% |
**Relative lift:** [+/- X.X%]
**P-value:** [value]
**95% CI:** [[lower]%, [upper]%]
**Statistically significant:** Yes / No
**Practically significant:** Yes / No (MDE was [X]%)
---
## Guardrail Metrics
| Metric | Control | Variant | Change | Status |
| --- | --- | --- | --- | --- |
| [metric] | [val] | [val] | [change] | OK / Warning |
---
## Recommendation
**Decision: [Ship / Investigate / Extend / Stop / Don't Ship]**
**Rationale:** [2–3 sentences explaining the decision]
**Next steps:**
1. [action]
2. [action]
---
## Learnings
- [What we learned from this test, regardless of outcome]
- [How this informs future experiments]
| Pitfall | Why It's Wrong | Correct Approach |
|---|---|---|
| Peeking at results daily | Inflates false positive rate | Wait for planned duration and sample size |
| Calling it at p=0.06 | "Almost significant" isn't significant | Set the threshold before the test, stick to it |
| Ignoring guardrails | Winning on one metric while losing on another | Always check guardrails before shipping |
| Post-hoc segmentation | Finding "it worked for mobile users!" after the fact is data mining | Pre-register segments or treat as hypothesis for next test |
| Running too many variants | Each variant needs full sample size | Limit to 1–2 variants per test |
| Not learning from losses | "It didn't work" is not a learning | Document WHY it didn't work and what to try next |
| Avoid | Why | Instead |
|---|---|---|
| "Directional win" | Not a statistical standard | Require p < 0.05 and lift > MDE |
| Shipping without guardrail check | May degrade critical metrics | Always check before shipping |
| Ending early because it "looks good" | Sequential testing bias | Run to planned duration |
| Not documenting learnings | Same failed experiments get repeated | Maintain an experiment log |