Exploratory Analysis Process

Overview

This skill guides you through systematic exploration of unfamiliar datasets when you don't yet have a specific question to answer. Unlike hypothesis-testing (where you test a specific claim) or guided-investigation (where you answer a specific question), exploratory analysis helps you discover what's interesting in the data and identify what questions you should be asking.

Exploratory analysis is appropriate when:

You have a new dataset and need to understand what's in it
The user says "Just see what's interesting" or "Tell me what stands out"
You need to discover patterns before formulating specific questions
You're looking for unexpected insights or anomalies
You want to generate hypotheses for later testing

Prerequisites

Before using this skill, you MUST:

Have data imported into SQLite database using the importing-data skill
Have data quality validated and cleaned using the cleaning-data skill (MANDATORY - never skip)
Have created an analysis workspace (just start-analysis exploratory-analysis <name>)
Have NO preconceived hypotheses or specific questions (if you do, use hypothesis-testing or guided-investigation instead)
Be familiar with the component skills:
- understanding-data - for data profiling
- writing-queries - for SQL query construction
- interpreting-results - for result analysis
- creating-visualizations - for text-based visualizations

Mandatory Process Structure

You MUST use TodoWrite to track progress through all 5 phases. Create todos at the start:

- Phase 1: Data Familiarization - pending
- Phase 2: Pattern Discovery - pending
- Phase 3: Anomaly Investigation - pending
- Phase 4: Insight Generation - pending
- Phase 5: Question Formulation - pending

Update status as you progress. Mark phases complete ONLY after checkpoint verification.

Phase 1: Data Familiarization

CHECKPOINT: Before proceeding, you MUST have:

Catalogued all tables and their apparent purposes
Documented schema (columns, types, relationships)
Assessed data quality (completeness, ranges, distributions)
Identified temporal coverage and granularity
Saved to 01 - data-familiarization.md

Instructions

Understand what data you have

Create analysis/[session-name]/01 - data-familiarization.md with: ./templates/01-data-familiarization.md

Use the understanding-data component skill
- Profile the database systematically
- Don't skip quality checks - surprises are common
- Look for obvious data issues that would affect exploration
Resist premature pattern-hunting
- Don't start with "I wonder if weekends are different"
- First understand WHAT data you have, THEN explore patterns
- If you catch yourself forming hypotheses, note them for Phase 2 but finish familiarization first

Common Rationalization: "I can see the tables, I don't need detailed familiarization" Reality: Skipping familiarization leads to missing important context about data quality, coverage, and structure.

Common Rationalization: "I'll explore patterns while I familiarize myself with the data" Reality: Mixing familiarization and pattern-hunting creates confusion. Separate concerns.

Phase 2: Pattern Discovery

CHECKPOINT: Before proceeding, you MUST have:

Explored temporal patterns (trends, seasonality, cycles)
Explored segmentation patterns (groups, categories, clusters)
Explored relationship patterns (correlations, associations)
Documented each exploration with rationale, query, results, observations
Created separate files for each exploration vector
Files saved as 02-temporal-patterns.md, 03-segmentation-patterns.md, 04-relationship-patterns.md

Instructions

Explore SYSTEMATICALLY along three vectors

You MUST explore all three vectors, even if some seem less promising. Surprises often come from unexpected places.

Vector 1: Temporal Patterns

Create analysis/[session-name]/02-temporal-patterns.md with: ./templates/02-temporal-patterns.md

Vector 2: Segmentation Patterns

Create analysis/[session-name]/03-segmentation-patterns.md with: ./templates/03-segmentation-patterns.md

Vector 3: Relationship Patterns

Create analysis/[session-name]/04-relationship-patterns.md with: ./templates/04-relationship-patterns.md

Be systematic, not random
- Explore ALL three vectors (temporal, segmentation, relationship)
- Don't skip a vector just because first query seems uninteresting
- Patterns often hide in places you don't expect
Separate observation from interpretation
- In each analysis, state FACTS (what the numbers show)
- Save interpretation for Phase 4 (Insight Generation)
- Resist the urge to explain patterns yet - just catalog them
Use visualizations liberally
- Text-based tables, ASCII charts, markdown formatting
- Help patterns become visible
- Refer to creating-visualizations component skill

Common Rationalization: "I found an interesting pattern, I'll just focus on that" Reality: Focusing on first interesting pattern causes you to miss better patterns elsewhere. Complete all three vectors.

Common Rationalization: "This dimension looks boring, I'll skip it" Reality: "Boring" dimensions often hide surprising patterns. Explore systematically.

Common Rationalization: "I'll combine all patterns into one analysis file" Reality: Separate files by vector creates structure and makes findings easy to locate.

Phase 3: Anomaly Investigation

CHECKPOINT: Before proceeding, you MUST have:

Identified 3-5 specific anomalies, outliers, or unexpected patterns from Phase 2
Investigated each anomaly with follow-up queries
Determined if anomalies are data quality issues or real phenomena
Documented findings
Saved to 05 - anomaly-investigation.md

Instructions

Review Phase 2 for anomalies

Go back through temporal, segmentation, and relationship explorations and identify:

Unexpected spikes or drops
Outlier values or segments
Counterintuitive patterns
Violations of expected business logic
Inconsistencies or irregularities

Investigate each anomaly

Create analysis/[session-name]/05 - anomaly-investigation.md with: ./templates/05-anomaly-investigation.md

Distinguish data quality from real patterns
- NULL values, duplicates, data entry errors → data quality issues
- Seasonal events, one-time promotions, external shocks → real phenomena
- When unclear, state that explicitly
Don't ignore anomalies
- Anomalies are often where the most interesting insights hide
- Investigate thoroughly, don't dismiss as "probably nothing"
- Data quality issues found now save headaches later

Common Rationalization: "That spike is probably a holiday, I'll ignore it" Reality: VERIFY your assumption. "Probably" isn't good enough. Check.

Common Rationalization: "This anomaly is a data quality issue, I'll just exclude it" Reality: Document what you're excluding and why. Future analysts need to know.

Common Rationalization: "I found 20 anomalies, I'll investigate them all" Reality: Focus on the 3-5 most significant. You're exploring, not auditing every data point.

Phase 4: Insight Generation

CHECKPOINT: Before proceeding, you MUST have:

Reviewed all patterns and anomalies from Phases 2-3
Identified 3-5 insights that are actionable, surprising, or meaningful
For each insight, documented: finding, significance, context, caveats
Assessed confidence level for each insight
Saved to 06 - insights.md

Instructions

Extract insights from patterns

Not every pattern is an insight. An insight must be:

Actionable: Suggests a decision or further investigation
Surprising: Non-obvious, counter to expectations, or revealing hidden structure
Meaningful: Materially significant magnitude or impact

Document insights systematically

Create analysis/[session-name]/06 - insights.md with: ./templates/06-insights.md

Be selective
- Don't try to turn every pattern into an insight
- 3-5 high-quality insights beat 20 marginal observations
- If you have more than 5, rank them and focus on top 5
Quantify magnitude
- "Sales vary by region" is not enough
- "Top region has 3x sales of bottom region" is specific
- Numbers make insights actionable
Assess confidence honestly
- High confidence: Strong evidence, large sample, consistent pattern, clear interpretation
- Medium confidence: Clear pattern but causation unclear OR limited time window OR potential confounds
- Low confidence: Interesting but small sample OR inconsistent OR multiple plausible explanations

Common Rationalization: "Every pattern I found is an insight" Reality: Most patterns are noise or expected. Be selective. Insights must clear the bar: actionable, surprising, meaningful.

Common Rationalization: "I'll just state the pattern and let the user figure out if it matters" Reality: Your job is to interpret. Explain WHY the pattern matters and WHAT it suggests.

Common Rationalization: "I'm very confident in this insight" Reality: Exploratory analysis rarely produces high confidence. Be honest about uncertainty and limitations.

Phase 5: Question Formulation

CHECKPOINT: Before proceeding, you MUST have:

Generated 3-5 specific, answerable questions based on insights
For each question, identified: what process to use, what data is needed, why it matters
Prioritized questions by potential value
Saved to 07 - next-questions.md
Updated 00 - overview.md with exploration summary

Instructions

Convert insights into questions

Each insight should suggest 1-2 follow-up questions. Good questions:

Build on what you learned in exploration
Are specific enough to be answerable
Have clear business value
Can be addressed with available (or obtainable) data

Document questions for follow-up

Create analysis/[session-name]/07 - next-questions.md with: ./templates/07-next-questions.md

Update exploration overview

Update analysis/[session-name]/00 - overview.md by adding content from: ./templates/overview-summary.md

Final verification checklist
- Explored all three vectors (temporal, segmentation, relationship)
- Investigated significant anomalies
- Generated 3-5 qualified insights (actionable, surprising, meaningful)
- Formulated specific follow-up questions
- Prioritized questions by value
- Updated overview with summary
- Communicated findings to user

Common Rationalization: "I found interesting patterns, I'm done" Reality: Exploration isn't complete until you've formulated what to investigate next. Always end with questions.

Common Rationalization: "I'll just suggest broad areas to explore further" Reality: Be specific. "Investigate customer behavior" is not actionable. "Compare weekend vs weekday sales per operating hour using comparative-analysis skill" is actionable.

Common Rationalization: "I'll list every possible question I can think of" Reality: Focus on the 3-5 highest-value questions. Too many options create decision paralysis.

Common Rationalizations

"I can see interesting patterns already, I'll skip data familiarization"

Why this is wrong: Without understanding data quality, coverage, and structure, your pattern discoveries may be artifacts or noise.

Do instead: Complete Phase 1 fully. Familiarization prevents false discoveries and wasted effort.

"This exploration vector looks boring, I'll skip it"

Why this is wrong: The most surprising insights often come from places you didn't expect. "Boring" dimensions frequently hide interesting patterns.

Do instead: Explore ALL three vectors (temporal, segmentation, relationship) systematically. Be comprehensive.

"I found one great insight, that's enough"

Why this is wrong: One insight doesn't exhaust a dataset. You likely missed other valuable patterns.

Do instead: Continue systematic exploration. Aim for 3-5 insights across different dimensions.

"Every pattern I found is an insight"

Why this is wrong: Most patterns are noise, expected, or immaterial. Calling everything an insight dilutes the valuable discoveries.

Do instead: Apply strict criteria: actionable, surprising, meaningful. Be selective.

"I'll just describe patterns and let the user decide if they're important"

Why this is wrong: Your job is interpretation, not just data reporting. Users expect you to identify what matters and why.

Do instead: Assess significance, provide context, explain business implications. Do the analytical thinking.

"I'm very confident in these exploratory findings"

Why this is wrong: Exploratory analysis generates hypotheses, not confirmations. Patterns need validation through targeted investigation.

Do instead: Be honest about confidence levels. Exploratory findings are typically medium-low confidence until validated.

"I found interesting patterns, analysis is complete"

Why this is wrong: Exploration is the beginning, not the end. The goal is to identify what to investigate deeply.

Do instead: Always complete Phase 5. Convert insights into specific, answerable questions for follow-up.

"I'll combine all patterns into one big report"

Why this is wrong: Mixing temporal, segmentation, and relationship analyses creates confusion and makes findings hard to locate.

Do instead: Separate files by exploration vector (02-temporal, 03-segmentation, 04-relationship). Clear structure aids comprehension.

"This anomaly is probably a data error, I'll just exclude it"

Why this is wrong: Assumptions about anomalies are often wrong. "Probably" isn't good enough. Also, excluding data without documentation creates reproducibility issues.

Do instead: Investigate anomalies in Phase 3. Document what you exclude and why. Verify your assumptions.

"I have 15 follow-up questions, I'll list them all"

Why this is wrong: Too many options create decision paralysis. Not all questions are equally valuable.

Do instead: Prioritize ruthlessly. Focus on 3-5 highest-value questions. Help the user know where to start.

Summary

This skill ensures systematic, thorough exploration of unfamiliar datasets by:

Familiarizing with data first: Understand structure, quality, and coverage before pattern-hunting
Exploring systematically: Three vectors (temporal, segmentation, relationship) ensure comprehensive discovery
Investigating anomalies: Surprises and outliers often contain the most valuable insights
Generating selective insights: Apply strict criteria (actionable, surprising, meaningful) to separate signal from noise
Formulating specific questions: Convert insights into answerable questions for deeper investigation
Prioritizing next steps: Help users know what to investigate first and why

Follow this process and you'll discover what's truly interesting in unfamiliar data, avoid random pattern-chasing, and identify high-value questions for targeted investigation.

exploratory-analysis

Exploratory Analysis Process

Overview

Prerequisites

Mandatory Process Structure

Phase 1: Data Familiarization

Instructions

Phase 2: Pattern Discovery

Instructions

Vector 1: Temporal Patterns

Vector 2: Segmentation Patterns

Vector 3: Relationship Patterns

Phase 3: Anomaly Investigation

Instructions

Phase 4: Insight Generation

Instructions

Phase 5: Question Formulation

Instructions

Common Rationalizations

"I can see interesting patterns already, I'll skip data familiarization"

"This exploration vector looks boring, I'll skip it"

"I found one great insight, that's enough"

"Every pattern I found is an insight"

"I'll just describe patterns and let the user decide if they're important"

"I'm very confident in these exploratory findings"

"I found interesting patterns, analysis is complete"

"I'll combine all patterns into one big report"

"This anomaly is probably a data error, I'll just exclude it"

"I have 15 follow-up questions, I'll list them all"

Summary

Similar Skills