From workflows
Interviews users about their datasets and databases to generate reusable data context skills that document schema, entities, metrics, and domain knowledge.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:data-contextThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract tribal knowledge about a dataset or database and generate a reusable data context skill.
Extract tribal knowledge about a dataset or database and generate a reusable data context skill.
## The Iron Law of Data ContextYOU MUST interview the user before generating ANY skill content. This is not negotiable.
You MUST NOT:
If you're about to write a skill based on your assumptions, STOP. Interview first.
Trigger: No existing data context skill for this project/dataset.
Create a new data context skill from scratch by interviewing the user about their data.
Trigger: Existing data context skill exists, user wants to add a new domain or update.
Read existing skill, identify gaps, interview for the new domain, merge into existing skill.
Before interviewing, check for existing data access skills that already encode tribal knowledge:
1. READ existing skills: /wrds, /lseg-data, or any project-local data skills
2. IDENTIFY what they already cover (table names, filters, field mappings, gotchas)
3. DO NOT re-document what existing skills handle
4. FOCUS the interview on the project-specific layer:
- Which specific tables/fields from WRDS or LSEG does THIS study use?
- How are identifiers linked across sources? (permno ↔ gvkey, RIC ↔ ISIN ↔ cusip)
- What sample filters define the study universe? (date range, exchange, firm type)
- What derived variables or transformations are project-specific?
The generated data context skill should reference existing skills rather than duplicate them:
## Data Sources
| Source | Skill | Tables/Fields Used |
|--------|-------|--------------------|
| WRDS | `/wrds` | comp.funda (at, lt, ceq), crsp.msf (ret, prc) |
| LSEG | `/lseg-data` | TR.F.TotRevenue, TR.GICSSector |
| Local | DuckDB | data/processed/merged_panel.parquet |
For connection details and critical filters, see the referenced skills.
1. DISCOVER data sources
→ What databases/files/APIs? Connection details?
→ For each: what dialect? (PostgreSQL, DuckDB, SQLite, Snowflake, etc.)
→ IMPORTANT: If user mentions WRDS or LSEG, read the corresponding skill first
and ask only about project-specific usage, not general access patterns
2. MAP entities
→ What are the core entities? (users, transactions, products, etc.)
→ How do they relate? (foreign keys, join paths)
→ CRITICAL: Disambiguate entity names
- "user" vs "account" vs "customer" — are these the same?
- "order" vs "transaction" vs "purchase" — clarify overlaps
- Document the canonical name and any aliases
3. DEFINE key metrics
→ What are the business-critical metrics?
→ For each metric:
- Exact definition (SQL or formula)
- Known edge cases
- Common misinterpretations
- Time grain (daily, monthly, etc.)
4. DOCUMENT data hygiene
→ Known data quality issues
→ Fields that lie (e.g., "created_at" that's actually "imported_at")
→ Nulls that mean something specific
→ Enums/codes that need translation
→ Date ranges with reliable data vs backfill periods
5. CAPTURE common gotchas
→ Joins that explode (many-to-many lurking as one-to-many)
→ Filters that are always needed (e.g., "WHERE is_deleted = false")
→ Time zones and their traps
→ Slowly changing dimensions
→ Tables that look useful but aren't (deprecated, partial, test data)
6. COLLECT common query patterns
→ Frequently needed aggregations
→ Standard date filters or cohort definitions
→ Boilerplate CTEs that everyone copies
Ask questions in batches of 3-5. Don't overwhelm with everything at once.
Round 1: Data Sources
Round 2: Core Entities (after Round 1 answers)
Round 3: Metrics & Definitions (after Round 2 answers)
Round 4: Data Quality & Gotchas (after Round 3 answers)
Round 5: Common Patterns (after Round 4 answers)
After the interview, generate a skill with this structure:
project-name/
├── .claude/
│ └── skills/
│ └── data-context/
│ ├── SKILL.md # Main skill file
│ └── references/
│ ├── entities.md # Entity definitions and relationships
│ ├── metrics.md # Metric definitions with SQL/formulas
│ └── gotchas.md # Data quality issues and common pitfalls
---
name: [project]-data-context
description: "Data context for [project]. Entity definitions, metric calculations, data quality notes, and common patterns for [data domain]."
---
# [Project] Data Context
## Data Sources
| Source | Skill/Dialect | Tables/Fields Used |
|--------|---------------|--------------------|
| [WRDS] | `/wrds` | [specific tables and fields for this project] |
| [LSEG] | `/lseg-data` | [specific fields for this project] |
| [Local] | [DuckDB/CSV/Parquet] | [file paths or database] |
For connection details and critical filters, see the referenced skill. This context covers only project-specific usage.
## Entity Map
[Entity relationship summary — which entities exist, how they connect]
See `references/entities.md` for full definitions.
## Key Metrics
[Top 3-5 metrics with brief definitions]
See `references/metrics.md` for exact calculations and edge cases.
## Critical Gotchas
[Top 3-5 gotchas that catch analysts]
See `references/gotchas.md` for full list.
## Common Patterns
[Frequently used query snippets or data access patterns]
For each reference file, include:
Before writing skill files, execute this gate:
Skipping this gate produces a skill based on your assumptions, not the user's knowledge. That skill will mislead every future analysis.
Before finalizing the skill:
When adding to an existing data context skill:
.claude/skills/data-context/ in the project rootds-delegate will have access to it automaticallyrevenue may be gross, net, or recognized; a user_id foreign key does not say whether users can hold multiple active accounts. A metric definition or relationship cardinality inferred from names is an unverified claim presented as fact — every downstream analysis that loads the generated skill inherits the error. The interview takes 10 minutes; weeks of wrong analysis is what it prevents./wrds, and LSEG field prefixes by /lseg-data. Re-documenting them here duplicates and drifts; document only the project-specific tables, fields, and filters THIS project uses.npx claudepluginhub edwinhu/workflows --plugin workflowsGenerates or improves company-specific data analysis skills by extracting warehouse schemas, entity details, metrics definitions, and query patterns from analysts.
Guides creation of instructions, skills, knowledge pointers, and memory events for persistent organizational context. Enforces the hard boundary between context (rules/procedures) and the semantic layer (logic/calculations).
Generates data model documentation including tables, constraints, indexes, retention policies, and migration notes from entities or PRD references.