Systematic exploratory data analysis process - discover patterns in unfamiliar data, identify meaningful insights, formulate specific questions for deeper investigation
Guides systematic exploration of unfamiliar datasets to discover patterns, identify anomalies, and formulate targeted follow-up questions.
npx claudepluginhub tilmon-engineering/claude-skillsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
templates/01-data-familiarization.mdtemplates/02-temporal-patterns.mdtemplates/03-segmentation-patterns.mdtemplates/04-relationship-patterns.mdtemplates/05-anomaly-investigation.mdtemplates/06-insights.mdtemplates/07-next-questions.mdtemplates/overview-summary.mdThis skill guides you through systematic exploration of unfamiliar datasets when you don't yet have a specific question to answer. Unlike hypothesis-testing (where you test a specific claim) or guided-investigation (where you answer a specific question), exploratory analysis helps you discover what's interesting in the data and identify what questions you should be asking.
Exploratory analysis is appropriate when:
Before using this skill, you MUST:
importing-data skillcleaning-data skill (MANDATORY - never skip)just start-analysis exploratory-analysis <name>)understanding-data - for data profilingwriting-queries - for SQL query constructioninterpreting-results - for result analysiscreating-visualizations - for text-based visualizationsYou MUST use TodoWrite to track progress through all 5 phases. Create todos at the start:
- Phase 1: Data Familiarization - pending
- Phase 2: Pattern Discovery - pending
- Phase 3: Anomaly Investigation - pending
- Phase 4: Insight Generation - pending
- Phase 5: Question Formulation - pending
Update status as you progress. Mark phases complete ONLY after checkpoint verification.
CHECKPOINT: Before proceeding, you MUST have:
01 - data-familiarization.mdCreate analysis/[session-name]/01 - data-familiarization.md with: ./templates/01-data-familiarization.md
Use the understanding-data component skill
Resist premature pattern-hunting
Common Rationalization: "I can see the tables, I don't need detailed familiarization" Reality: Skipping familiarization leads to missing important context about data quality, coverage, and structure.
Common Rationalization: "I'll explore patterns while I familiarize myself with the data" Reality: Mixing familiarization and pattern-hunting creates confusion. Separate concerns.
CHECKPOINT: Before proceeding, you MUST have:
02-temporal-patterns.md, 03-segmentation-patterns.md, 04-relationship-patterns.mdYou MUST explore all three vectors, even if some seem less promising. Surprises often come from unexpected places.
Create analysis/[session-name]/02-temporal-patterns.md with: ./templates/02-temporal-patterns.md
Create analysis/[session-name]/03-segmentation-patterns.md with: ./templates/03-segmentation-patterns.md
Create analysis/[session-name]/04-relationship-patterns.md with: ./templates/04-relationship-patterns.md
Be systematic, not random
Separate observation from interpretation
Use visualizations liberally
creating-visualizations component skillCommon Rationalization: "I found an interesting pattern, I'll just focus on that" Reality: Focusing on first interesting pattern causes you to miss better patterns elsewhere. Complete all three vectors.
Common Rationalization: "This dimension looks boring, I'll skip it" Reality: "Boring" dimensions often hide surprising patterns. Explore systematically.
Common Rationalization: "I'll combine all patterns into one analysis file" Reality: Separate files by vector creates structure and makes findings easy to locate.
CHECKPOINT: Before proceeding, you MUST have:
05 - anomaly-investigation.mdGo back through temporal, segmentation, and relationship explorations and identify:
Create analysis/[session-name]/05 - anomaly-investigation.md with: ./templates/05-anomaly-investigation.md
Distinguish data quality from real patterns
Don't ignore anomalies
Common Rationalization: "That spike is probably a holiday, I'll ignore it" Reality: VERIFY your assumption. "Probably" isn't good enough. Check.
Common Rationalization: "This anomaly is a data quality issue, I'll just exclude it" Reality: Document what you're excluding and why. Future analysts need to know.
Common Rationalization: "I found 20 anomalies, I'll investigate them all" Reality: Focus on the 3-5 most significant. You're exploring, not auditing every data point.
CHECKPOINT: Before proceeding, you MUST have:
06 - insights.mdNot every pattern is an insight. An insight must be:
Create analysis/[session-name]/06 - insights.md with: ./templates/06-insights.md
Be selective
Quantify magnitude
Assess confidence honestly
Common Rationalization: "Every pattern I found is an insight" Reality: Most patterns are noise or expected. Be selective. Insights must clear the bar: actionable, surprising, meaningful.
Common Rationalization: "I'll just state the pattern and let the user figure out if it matters" Reality: Your job is to interpret. Explain WHY the pattern matters and WHAT it suggests.
Common Rationalization: "I'm very confident in this insight" Reality: Exploratory analysis rarely produces high confidence. Be honest about uncertainty and limitations.
CHECKPOINT: Before proceeding, you MUST have:
07 - next-questions.md00 - overview.md with exploration summaryEach insight should suggest 1-2 follow-up questions. Good questions:
Create analysis/[session-name]/07 - next-questions.md with: ./templates/07-next-questions.md
Update analysis/[session-name]/00 - overview.md by adding content from: ./templates/overview-summary.md
Common Rationalization: "I found interesting patterns, I'm done" Reality: Exploration isn't complete until you've formulated what to investigate next. Always end with questions.
Common Rationalization: "I'll just suggest broad areas to explore further" Reality: Be specific. "Investigate customer behavior" is not actionable. "Compare weekend vs weekday sales per operating hour using comparative-analysis skill" is actionable.
Common Rationalization: "I'll list every possible question I can think of" Reality: Focus on the 3-5 highest-value questions. Too many options create decision paralysis.
Why this is wrong: Without understanding data quality, coverage, and structure, your pattern discoveries may be artifacts or noise.
Do instead: Complete Phase 1 fully. Familiarization prevents false discoveries and wasted effort.
Why this is wrong: The most surprising insights often come from places you didn't expect. "Boring" dimensions frequently hide interesting patterns.
Do instead: Explore ALL three vectors (temporal, segmentation, relationship) systematically. Be comprehensive.
Why this is wrong: One insight doesn't exhaust a dataset. You likely missed other valuable patterns.
Do instead: Continue systematic exploration. Aim for 3-5 insights across different dimensions.
Why this is wrong: Most patterns are noise, expected, or immaterial. Calling everything an insight dilutes the valuable discoveries.
Do instead: Apply strict criteria: actionable, surprising, meaningful. Be selective.
Why this is wrong: Your job is interpretation, not just data reporting. Users expect you to identify what matters and why.
Do instead: Assess significance, provide context, explain business implications. Do the analytical thinking.
Why this is wrong: Exploratory analysis generates hypotheses, not confirmations. Patterns need validation through targeted investigation.
Do instead: Be honest about confidence levels. Exploratory findings are typically medium-low confidence until validated.
Why this is wrong: Exploration is the beginning, not the end. The goal is to identify what to investigate deeply.
Do instead: Always complete Phase 5. Convert insights into specific, answerable questions for follow-up.
Why this is wrong: Mixing temporal, segmentation, and relationship analyses creates confusion and makes findings hard to locate.
Do instead: Separate files by exploration vector (02-temporal, 03-segmentation, 04-relationship). Clear structure aids comprehension.
Why this is wrong: Assumptions about anomalies are often wrong. "Probably" isn't good enough. Also, excluding data without documentation creates reproducibility issues.
Do instead: Investigate anomalies in Phase 3. Document what you exclude and why. Verify your assumptions.
Why this is wrong: Too many options create decision paralysis. Not all questions are equally valuable.
Do instead: Prioritize ruthlessly. Focus on 3-5 highest-value questions. Help the user know where to start.
This skill ensures systematic, thorough exploration of unfamiliar datasets by:
Follow this process and you'll discover what's truly interesting in unfamiliar data, avoid random pattern-chasing, and identify high-value questions for targeted investigation.