Rigorous hypothesis testing process for data analysis - formulate hypotheses before looking at data, design tests, analyze systematically, interpret with skepticism
Rigorous hypothesis testing process for data analysis - formulate hypotheses before looking at data, design tests, analyze systematically, interpret with skepticism. Use this when you need to test specific claims (e.g., "Does X affect Y?") and want to avoid confirmation bias and p-hacking.
/plugin marketplace add tilmon-engineering/claude-skills/plugin install datapeeker@tilmon-eng-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
templates/overview-summary.mdtemplates/phase-1.mdtemplates/phase-2.mdtemplates/phase-3-query.mdtemplates/phase-4.mdtemplates/phase-5.mdThis skill guides you through rigorous hypothesis testing in data analysis. The core principle is scientific rigor: formulate your hypotheses BEFORE looking at the data to avoid p-hacking and confirmation bias.
Hypothesis testing is appropriate when:
Before using this skill, you MUST:
importing-data skillcleaning-data skill (MANDATORY - never skip)just start-analysis hypothesis-testing <name>)understanding-data - for data profilingwriting-queries - for SQL query constructioninterpreting-results - for result analysiscreating-visualizations - for text-based visualizationsYou MUST use TodoWrite to track progress through all 5 phases. Create todos at the start:
- Phase 1: Hypothesis Formulation (H0/H1) - pending
- Phase 2: Test Design - pending
- Phase 3: Data Analysis - pending
- Phase 4: Result Interpretation - pending
- Phase 5: Conclusion and Follow-up - pending
Update status as you progress. Mark phases complete ONLY after checkpoint verification.
CHECKPOINT: Before proceeding, you MUST have:
01 - hypothesis-formulation.mdAsk clarifying questions about the user's analytical goal
Write hypotheses in 01 - hypothesis-formulation.md with: ./templates/phase-1.md
STOP and get user confirmation
Common Rationalization: "I'll just peek at the data to refine the hypothesis" Reality: This creates confirmation bias. Formulate hypothesis FIRST, always.
CHECKPOINT: Before proceeding, you MUST have:
02 - test-design.mdDesign your test in 02 - test-design.md with: ./templates/phase-2.md
STOP and verify test design
Common Rationalization: "I'll just run one quick query to see if the data exists" Reality: That's looking at data before test design is complete. Finish design FIRST.
CHECKPOINT: Before proceeding, you MUST have:
Create separate numbered files:
03 - schema-check.md04 - data-quality-check.md05 - main-analysis.md06 - supporting-analysis.md (if needed)For each query file, use this structure: ./templates/phase-3-query.md
Execute queries using appropriate tool
Handle data quality issues
Common Rationalization: "I'll skip data quality checks since the data looks fine" Reality: ALWAYS check data quality. Surprises happen. Document what you checked.
Common Rationalization: "I'll combine all queries into one file to save time" Reality: Separate files create clear audit trail. One query, one file.
CHECKPOINT: Before proceeding, you MUST have:
07 - interpretation.md (or next number)Create interpretation file: XX - interpretation.md with: ./templates/phase-4.md
Be intellectually honest
Common Rationalization: "The pattern is obvious, I don't need to consider alternatives" Reality: Always consider alternatives. Obvious patterns often have surprising explanations.
Common Rationalization: "I'll downplay limitations so the conclusion looks stronger" Reality: Stating limitations INCREASES credibility. Be honest about uncertainty.
CHECKPOINT: Before proceeding, you MUST have:
08 - conclusion.md (or next number)00 - overview.md with summaryCreate conclusion file: XX - conclusion.md with: ./templates/phase-5.md
Update overview file: 00 - overview.md
Add summary section with: ./templates/overview-summary.md
Common Rationalization: "I found the answer, I'm done" Reality: Good analysis always identifies the NEXT question. List follow-ups.
Common Rationalization: "I'll skip updating the overview since it's all in the detailed files" Reality: Overview provides navigational summary. Always update it.
User wants to test: "Do we get more sales on weekends?"
# Hypothesis Formulation
## Analytical Goal
Determine if weekend days (Saturday, Sunday) have higher sales than weekday days (Monday-Friday)
## Context
Business wants to optimize staffing and inventory. If weekends are significantly busier, we should staff up. If not, we can balance resources across the week.
## Hypotheses
### Null Hypothesis (H0)
Weekend days and weekday days have equal average sales. Any observed differences are due to random variation.
### Alternative Hypothesis (H1)
Weekend days have significantly different average sales compared to weekday days.
## Success Criteria
- Difference >25% would be practically meaningful (enough to justify staffing changes)
- Pattern should be consistent across multiple weeks
## Potential Confounds
- Holidays: Holiday Monday might inflate weekday averages
- Promotions: Weekend promotions might inflate weekend sales
- Store hours: Different hours on weekends might affect opportunity
- Seasonality: Analysis period might not be representative of full year
# Test Design
## Metrics
### Primary Metric
Average daily sales amount: SUM(amount) / COUNT(DISTINCT date) for weekend vs weekday groups
### Supporting Metrics
- Total transaction count per day-of-week (to assess sample size)
- Median sales per transaction (to check if average is representative)
- Week-over-week consistency (to see if pattern is stable)
## Comparison Structure
Group days into two categories:
- Weekend: Saturday (day 6), Sunday (day 0)
- Weekday: Monday-Friday (days 1-5)
Calculate average daily sales for each group, compare the ratio.
## Data Requirements
### Required Tables/Columns
- Table: `sales` (or similar)
- `transaction_date` (DATE or TEXT in ISO format)
- `amount` (NUMERIC)
### Data Quality Checks
1. Check for NULL dates or amounts
2. Verify date range covers complete weeks
3. Confirm adequate sample size (>100 transactions per day-of-week)
4. Identify any outlier days (holidays, system outages)
### Queries Needed
1. Schema check: PRAGMA table_info(sales)
2. Data quality: NULL checks, date range, outlier detection
3. Main analysis: Daily sales by day-of-week
4. Supporting: Transaction counts, weekly pattern consistency
## Statistical Considerations
Look for differences >25% between weekend and weekday averages. Check if pattern is consistent across multiple weeks (not just one unusual weekend).
03 - schema-check.md:
# Schema Check
## Rationale
Verify the sales table exists and has the required columns for date and amount
## Query
```sql
PRAGMA table_info(sales);
cid | name | type | notnull | dflt_value | pk
0 | transaction_id | INTEGER | 0 | NULL | 1
1 | transaction_date | TEXT | 0 | NULL | 0
2 | amount | REAL | 0 | NULL | 0
3 | product_id | INTEGER | 0 | NULL | 0
sales existstransaction_date column (TEXT type - will need STRFTIME to extract day-of-week)amount column (REAL type - good for calculations)
**04 - data-quality-check.md:**
```markdown
# Data Quality Check
## Rationale
Verify data completeness and identify any issues before main analysis
## Query
```sql
SELECT
COUNT(*) as total_rows,
COUNT(CASE WHEN transaction_date IS NULL THEN 1 END) as null_dates,
COUNT(CASE WHEN amount IS NULL THEN 1 END) as null_amounts,
MIN(transaction_date) as earliest_date,
MAX(transaction_date) as latest_date,
MIN(amount) as min_amount,
MAX(amount) as max_amount
FROM sales;
total_rows | null_dates | null_amounts | earliest_date | latest_date | min_amount | max_amount
15847 | 0 | 0 | 2024-01-01 | 2024-03-31 | 5.00 | 899.99
**05 - main-analysis.md:**
```markdown
# Main Analysis: Sales by Day of Week
## Rationale
Calculate average sales by day of week to test if weekend days differ from weekdays
## Query
```sql
SELECT
CAST(STRFTIME('%w', transaction_date) AS INTEGER) as day_of_week,
CASE
WHEN CAST(STRFTIME('%w', transaction_date) AS INTEGER) IN (0, 6) THEN 'Weekend'
ELSE 'Weekday'
END as day_type,
COUNT(DISTINCT transaction_date) as days_count,
COUNT(*) as transaction_count,
SUM(amount) as total_sales,
ROUND(SUM(amount) / COUNT(DISTINCT transaction_date), 2) as avg_daily_sales,
ROUND(AVG(amount), 2) as avg_transaction_amount
FROM sales
GROUP BY day_of_week, day_type
ORDER BY day_of_week;
day_of_week | day_type | days_count | transaction_count | total_sales | avg_daily_sales | avg_transaction_amount
0 | Weekend | 13 | 1456 | 52389.44 | 4030.00 | 35.98
1 | Weekday | 13 | 2234 | 102345.67 | 7872.90 | 45.81
2 | Weekday | 13 | 2198 | 98234.55 | 7556.50 | 44.70
3 | Weekday | 13 | 2301 | 105678.90 | 8129.15 | 45.93
4 | Weekday | 13 | 2345 | 108901.23 | 8377.02 | 46.44
5 | Weekday | 13 | 2456 | 115432.11 | 8879.39 | 47.00
6 | Weekend | 13 | 2857 | 98765.43 | 7597.34 | 34.56
**06 - supporting-analysis.md:**
```markdown
# Supporting Analysis: Weekend vs Weekday Comparison
## Rationale
Aggregate weekend and weekday categories to test the hypothesis directly
## Query
```sql
SELECT
CASE
WHEN CAST(STRFTIME('%w', transaction_date) AS INTEGER) IN (0, 6) THEN 'Weekend'
ELSE 'Weekday'
END as day_type,
COUNT(DISTINCT transaction_date) as days_count,
COUNT(*) as transaction_count,
SUM(amount) as total_sales,
ROUND(SUM(amount) / COUNT(DISTINCT transaction_date), 2) as avg_daily_sales,
ROUND(AVG(amount), 2) as avg_transaction_amount
FROM sales
GROUP BY day_type;
day_type | days_count | transaction_count | total_sales | avg_daily_sales | avg_transaction_amount
Weekday | 65 | 11534 | 530592.46 | 8163.00 | 46.01
Weekend | 26 | 4313 | 151154.87 | 5813.65 | 35.04
### Phase 4: Interpretation (07 - interpretation.md)
```markdown
# Result Interpretation
## Summary of Findings
### Primary Metric Results
Weekends have significantly LOWER sales than weekdays:
- Weekday avg: $8,163 per day
- Weekend avg: $5,814 per day
- Difference: 29% lower on weekends
This contradicts the original hypothesis that weekends would have higher sales.
### Statistical Assessment
- Magnitude: 29% difference is both statistically meaningful (large sample) and practically significant (>25% threshold)
- Consistency: Pattern holds across 13 weeks of data
- Practical importance: This magnitude justifies different staffing/inventory approaches
## Alternative Explanations
1. **Store hours:** Maybe stores close earlier on weekends
- Evidence: Transaction count is 62% lower on weekends, suggesting operational constraint
- Impact: Could fully explain lower daily totals if hours are reduced
2. **Customer behavior:** Maybe weekend shoppers buy smaller items
- Evidence: Average transaction is 24% smaller on weekends ($35 vs $46)
- Impact: Explains some but not all of the difference
3. **Product mix:** Maybe high-value products aren't available on weekends
- Evidence: Would need product category data to verify
- Impact: Unknown
4. **Sunday effect:** Maybe Sunday drags down weekend average
- Evidence: Sunday has only $4,030 avg (47% lower than Saturday's $7,597)
- Impact: If we exclude Sunday, Saturday is closer to weekdays but still 7% lower
## Hypothesis Test Result
### Null Hypothesis (H0)
Weekend days and weekday days have equal average sales.
### Decision
**REJECT H0**
### Rationale
The 29% difference between weekend and weekday average sales is large, consistent across 13 weeks, and exceeds our 25% practical significance threshold. The pattern is clear: weekends have lower sales than weekdays.
HOWEVER: This is the OPPOSITE of what we hypothesized. The alternative hypothesis stated "weekend days have significantly different sales" which is true, but we expected higher, not lower.
## Limitations
1. **Cannot determine causation:** Data shows weekends are lower but doesn't explain why
2. **Store operational factors unknown:** Don't have data on hours, staffing, inventory availability
3. **Sunday is dramatically lower:** Weekend average is heavily influenced by very low Sunday sales
4. **Seasonal effects:** 3 months (Jan-Mar) may not represent full year (could be post-holiday slump)
5. **No control for promotions/holidays:** Any holidays in the period could skew results
# Conclusion and Follow-up
## Main Conclusion
Weekends have 29% lower average daily sales than weekdays, contradicting the initial hypothesis that weekends would be busier. This pattern is consistent across 13 weeks and appears driven by both fewer transactions (62% lower) and smaller average purchases (24% lower).
## Actionable Insights
**DO NOT increase weekend staffing based on this data.** The hypothesis that weekends are busier is not supported.
Instead:
1. **Investigate Sunday specifically:** With only $4,030 avg daily sales (vs $7,600 on Saturday), Sunday operations may not be profitable. Consider:
- Reduced hours on Sunday
- Sunday-specific promotions to drive traffic
- Or closing on Sundays if fixed costs exceed contribution margin
2. **Understand operational constraints:** Before making staffing decisions, determine:
- Are weekend hours reduced? (This might explain lower transaction counts)
- Is weekend inventory limited? (This might explain smaller purchases)
- Are certain high-value products unavailable on weekends?
3. **Focus weekday resources:** If pattern holds year-round, optimize for Monday-Friday peak demand
## Follow-up Questions
1. **What are actual store operating hours by day?**
- Data needed: Store hours table, or time-of-day in transaction timestamps
- Approach: Calculate sales per operating hour rather than per day
- Why: Would distinguish "fewer hours" from "fewer customers per hour"
2. **Does the weekend pattern hold across all seasons?**
- Data needed: Full year of transaction data
- Approach: Repeat this analysis for each quarter
- Why: Jan-Mar might be post-holiday slump; need to verify pattern is year-round
3. **Are there product category differences on weekends?**
- Data needed: Product category or department in transaction records
- Approach: Analyze category mix and average prices by day-of-week
- Why: Would explain the 24% smaller average transaction on weekends
4. **How do weekends compare if we exclude Sunday?**
- Data needed: Current dataset (already have this)
- Approach: Re-run analysis treating Sunday separately
- Why: Sunday is so dramatically lower it may be distorting the weekend average
## Confidence Level
**Medium confidence** in the finding that weekends are lower, but **low confidence** in the explanation.
Reasoning:
- Pattern is clear and consistent across 13 weeks (strengthens confidence)
- Sample sizes are adequate (4,313 weekend transactions is plenty)
- BUT: Cannot distinguish operational constraints from customer behavior (weakens confidence)
- AND: Only 3 months of data may not represent full year (weakens confidence)
**Next step:** Investigate store hours and expand to full year data before making operational changes.
Why this is wrong: Looking at data before formulating hypotheses creates confirmation bias. You'll unconsciously form hypotheses that match what you see, then "test" them. This isn't science.
Do instead: Formulate hypothesis from domain knowledge, business context, or theory. Then look at data.
Why this is wrong: Hindsight bias makes everything seem obvious after you see the answer. Writing hypothesis first creates accountability.
Do instead: Always write H0 and H1 in Phase 1 before any query execution.
Why this is wrong: You can't know data is clean until you check. Surprises happen often.
Do instead: ALWAYS run data quality queries. Document what you checked and what you found (even if it's "no issues").
Why this is wrong: Obvious patterns often have non-obvious causes. Confounding factors are common.
Do instead: Always list 2-3 alternative explanations in Phase 4, even if you think they're unlikely.
Why this is wrong: One file per query creates clear audit trail and makes analysis reproducible.
Do instead: One query, one file. Use consistent numbering (03-, 04-, etc.).
Why this is wrong: Good analysis always identifies the next question. Every conclusion should raise new questions.
Do instead: Always list 2-3 follow-up questions in Phase 5.
Why this is wrong: Acknowledging limitations increases credibility. Readers trust honest uncertainty more than false certainty.
Do instead: State limitations clearly. Be honest about what you don't know.
Why this is wrong: Overview provides navigation and quick summary. Future readers (including you) will thank you.
Do instead: Always update 00 - overview.md with results summary in Phase 5.
This skill ensures rigorous, reproducible hypothesis testing by:
Follow this process and you'll produce defensible, credible analysis that stands up to scrutiny.
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.