Systematic comparison of segments, cohorts, or time periods - ensure fair apples-to-apples comparisons, identify meaningful differences, explain WHY differences exist
Systematic comparison of segments, cohorts, or time periods - ensure fair apples-to-apples comparisons, identify meaningful differences, explain WHY differences exist
/plugin marketplace add tilmon-engineering/claude-skills/plugin install datapeeker@tilmon-eng-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
templates/phase-1.mdtemplates/phase-2.mdtemplates/phase-3.mdtemplates/phase-4.mdtemplates/phase-5.mdThis skill guides you through systematic comparison of two or more groups, segments, cohorts, or time periods. Unlike exploratory-analysis (where you discover patterns) or guided-investigation (where you answer broad questions), comparative analysis helps you rigorously compare specific groups and explain why they differ.
Comparative analysis is appropriate when:
Before using this skill, you MUST:
importing-data skillcleaning-data skill (MANDATORY - never skip)just start-analysis comparative-analysis <name>)understanding-data - for data profilingwriting-queries - for SQL query constructioninterpreting-results - for result analysiscreating-visualizations - for text-based visualizationsYou MUST use TodoWrite to track progress through all 5 phases. Create todos at the start:
- Phase 1: Define Comparison - pending
- Phase 2: Segment Definition - pending
- Phase 3: Metric Comparison - pending
- Phase 4: Difference Explanation - pending
- Phase 5: Conclusions and Recommendations - pending
Update status as you progress. Mark phases complete ONLY after checkpoint verification.
CHECKPOINT: Before proceeding, you MUST have:
01 - comparison-definition.mdAsk clarifying questions:
Create analysis/[session-name]/01-comparison-definition.md with: ./templates/phase-1.md
Common Rationalization: "The comparison is obvious, I'll just start querying" Reality: Without explicit definition, you'll make unstated assumptions about what "fair" means.
Common Rationalization: "I'll compare everything and see what's different" Reality: Unfocused comparison produces noise. Define specific metrics and materiality thresholds.
CHECKPOINT: Before proceeding, you MUST have:
02 - segment-definition.mdCreate analysis/[session-name]/02-segment-definition.md with: ./templates/phase-2.md
If groups are not comparable:
If you exclude outliers, filter dates, or adjust definitions, document clearly:
## Adjustments Made for Fair Comparison
1. **Outlier handling:** Excluded 8 transactions >$50,000 as data entry errors (confirmed with field validation)
2. **Date alignment:** Limited both groups to Feb 1 - Apr 30 to match shorter group's coverage
3. **Null handling:** Excluded 127 transactions with NULL customer_id from both groups
Common Rationalization: "The groups look fine, I'll skip validation" Reality: Unstated data quality issues or sample size problems will invalidate your comparison.
Common Rationalization: "Sample sizes are different but that's okay" Reality: Large sample size differences require per-capita normalization. Raw totals are misleading.
CHECKPOINT: Before proceeding, you MUST have:
03 - metric-comparison.mdCreate analysis/[session-name]/03-metric-comparison.md with: ./templates/phase-3.md
Compare apples-to-apples:
Wrong: "Northeast has $458K revenue vs Southeast's $392K" Right: "Northeast has $161/customer vs Southeast's $150/customer (7.5% higher)"
Statistical significance: Is difference larger than random variation would explain?
Practical significance: Is difference large enough to matter for decisions?
Common Rationalization: "I'll just show the raw numbers and let the user interpret" Reality: Your job is interpretation. Show differences clearly and explain what they mean.
Common Rationalization: "Northeast has more revenue, that's the answer" Reality: Explain WHY - more customers? Higher spend per customer? Different product mix? Dig deeper.
Common Rationalization: "These differences are statistically significant, so they matter" Reality: Statistical significance ≠ practical significance. A 1% difference might be "significant" but not meaningful.
CHECKPOINT: Before proceeding, you MUST have:
04 - difference-explanation.mdDon't try to explain every small difference. Focus on:
Create analysis/[session-name]/04-difference-explanation.md with: ./templates/phase-4.md
Comparative analysis shows WHAT differs, not always WHY:
When metrics differ, break them into components:
Common Rationalization: "I found the difference, that's enough" Reality: Finding the difference is half the job. Explaining WHY is equally important.
Common Rationalization: "This factor correlates with the difference, so it's the cause" Reality: Correlation ≠ causation. Multiple factors may correlate. Be cautious about causal claims.
Common Rationalization: "I'll ignore confounds since I can't measure them" Reality: Acknowledge unmeasured confounds explicitly. They limit your conclusions but shouldn't be ignored.
CHECKPOINT: Before proceeding, you MUST have:
05 - conclusions-and-recommendations.md00 - overview.md with comparison summaryCreate analysis/[session-name]/05-conclusions-and-recommendations.md with: ./templates/phase-5.md
Update: 00 - overview.md
Add at the end:
## Comparison Summary
**Groups Compared:** [Groups]
**Time Period:** [Date range]
**Comparison Completed:** [Date]
---
## Key Differences Identified
1. **[Difference 1]:** [Brief description with magnitude]
- Driver: [Primary explanation]
- Confidence: [High/Medium/Low]
2. **[Difference 2]:** [Brief description with magnitude]
- Driver: [Primary explanation]
- Confidence: [High/Medium/Low]
3. **[Difference 3]:** [Brief description with magnitude]
- Driver: [Primary explanation]
- Confidence: [High/Medium/Low]
---
## Top Recommendations
1. **[Recommendation 1]:** [One sentence]
- Expected impact: [Magnitude]
2. **[Recommendation 2]:** [One sentence]
- Expected impact: [Magnitude]
---
## File Index
- 01 - Comparison Definition
- 02 - Segment Definition
- 03 - Metric Comparison
- 04 - Difference Explanation
- 05 - Conclusions and Recommendations
Present conclusions clearly:
Common Rationalization: "I'll just present all the numbers and let the user draw conclusions" Reality: Your job is to interpret and synthesize. Provide clear conclusions, not just data dumps.
Common Rationalization: "I'm 100% confident in these conclusions" Reality: Be honest about confidence levels and limitations. Overconfidence undermines credibility.
Common Rationalization: "Comparison complete, no follow-up needed" Reality: Comparisons often raise more questions than they answer. Identify high-value follow-up investigations.
Why this is wrong: "Obvious" differences often have unstated assumptions. Explicit definition prevents misunderstandings and ensures you're answering the right question.
Do instead: Complete Phase 1 fully. Define groups, metrics, and materiality thresholds explicitly.
Why this is wrong: Sample size is only one aspect. Data quality, temporal alignment, and outliers can invalidate comparisons even with large samples.
Do instead: Validate groups thoroughly in Phase 2. Check quality, coverage, and comparability.
Why this is wrong: Raw totals are misleading when groups have different sizes. $500K vs $400K tells you nothing if one group is 2x the size of the other.
Do instead: Normalize by customers, days, or transactions. Compare per-capita or rate metrics.
Why this is wrong: Stating WHAT differs without explaining WHY provides limited value. The explanation is where actionable insights live.
Do instead: Decompose differences. Explain whether higher revenue comes from more customers, higher per-customer value, different mix, etc.
Why this is wrong: Different doesn't mean better. Context matters. Lower-revenue group might serve a different market, have different goals, or optimize for different metrics.
Do instead: Interpret differences in context. Consider whether "better" is even the right framing.
Why this is wrong: Correlation ≠ causation. Many factors correlate with outcomes without causing them.
Do instead: Be cautious with causal language. Say "associated with" or "correlated with" rather than "caused by" unless you have experimental evidence.
Why this is wrong: Unmeasured confounds don't disappear by ignoring them. They limit what you can conclude.
Do instead: Explicitly acknowledge unmeasured confounds in Phase 4. Explain how they limit causal interpretation.
Why this is wrong: You don't know if differences are due to replicable practices or immutable characteristics (market size, demographics, etc.).
Do instead: Recommend further investigation to understand whether differences are actionable or structural.
Why this is wrong: Comparisons typically reveal new questions about root causes, generalizability, and interventions.
Do instead: Identify high-value follow-up questions in Phase 5. Guide next investigations.
Why this is wrong: With large samples, tiny differences can be statistically significant but practically meaningless.
Do instead: Focus on practical significance (materiality threshold). A 1% difference might be "significant" but not meaningful.
This skill ensures rigorous, fair comparisons by:
Follow this process and you'll deliver fair, rigorous comparisons that explain not just WHAT differs but WHY, identify actionable opportunities, and guide follow-up investigations.
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.