Document data flow, transformation lineage, and impact analysis for traceability and compliance.
Tracks data flow from source to destination, documenting transformations and dependencies for impact analysis. Use when mapping column-level lineage, creating transformation documentation, or analyzing change impacts across data pipelines.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install data-architecture@melodic-softwareThis skill is limited to using the following tools:
Use this skill when:
Data lineage tracks data from origin to destination, documenting every transformation along the way. Essential for impact analysis, debugging, compliance, and trust in data.
| Level | Description | Use Case |
|---|---|---|
| Dataset | Table/file level relationships | High-level architecture |
| Column | Field-level mappings | Detailed impact analysis |
| Cell | Record-level tracking | Audit trails |
| Business | Logical concept flow | Business understanding |
SOURCE-TO-TARGET LINEAGE
┌──────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌─────────────┐ ┌─────────┐ ┌───────────┐ │
│ │ CRM │───►│ STAGING │───►│ DWH │───►│ REPORTS │ │
│ │ System │ │ Tables │ │ Facts │ │ & BI │ │
│ └─────────┘ └─────────────┘ └─────────┘ └───────────┘ │
│ │ │ │ │ │
│ │ │ │ │ │
│ ┌────┴────┐ ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐ │
│ │customers│ │stg_customer│ │dim_customer│ │sales_report│ │
│ │ orders │ │stg_order │ │fact_sales │ │cust_dashboard│
│ │products │ │stg_product │ │dim_product│ │product_analytics
│ └─────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
# Source-to-Target Mapping: dim_customer
## Target Table
- Schema: dbo
- Table: dim_customer
- Type: Dimension (SCD Type 2)
## Column Mappings
| Target Column | Source System | Source Table | Source Column | Transformation |
|---------------|---------------|--------------|---------------|----------------|
| customer_key | Generated | N/A | N/A | IDENTITY |
| customer_id | CRM | customers | customer_id | TRIM, UPPER |
| full_name | CRM | customers | first_name + last_name | CONCAT, TRIM |
| email | CRM | customers | email | LOWER, TRIM |
| segment | CRM | customers | tier | CASE mapping |
| city | CRM | addresses | city | TRIM |
| state | CRM | addresses | state_code | Lookup to ref_state |
| country | CRM | addresses | country_code | Lookup to ref_country |
| effective_date | Generated | N/A | N/A | GETUTCDATE() |
| is_current | Generated | N/A | N/A | 1 (new records) |
## Transformation Rules
### Rule 1: Full Name Concatenation
```sql
CONCAT(TRIM(first_name), ' ', TRIM(last_name))
CASE tier
WHEN 'G' THEN 'Gold'
WHEN 'S' THEN 'Silver'
WHEN 'B' THEN 'Bronze'
ELSE 'Standard'
END
customers c
LEFT JOIN addresses a ON c.customer_id = a.customer_id
AND a.address_type = 'PRIMARY'
LEFT JOIN ref_state rs ON a.state_code = rs.state_code
flowchart LR
subgraph Sources
CRM[(CRM)]
ERP[(ERP)]
ECOM[(E-Commerce)]
end
subgraph Staging
STG_CUST[stg_customer]
STG_ORD[stg_order]
STG_PROD[stg_product]
end
subgraph Transformations
T1[Cleanse]
T2[Match & Merge]
T3[Enrich]
end
subgraph Data Warehouse
DIM_CUST[dim_customer]
DIM_PROD[dim_product]
FACT_SALES[fact_sales]
end
subgraph Consumption
RPT[Reports]
DASH[Dashboards]
API[Data API]
end
CRM --> STG_CUST
ERP --> STG_ORD
ECOM --> STG_PROD
STG_CUST --> T1
STG_ORD --> T1
STG_PROD --> T1
T1 --> T2
T2 --> T3
T3 --> DIM_CUST
T3 --> DIM_PROD
T3 --> FACT_SALES
DIM_CUST --> RPT
DIM_PROD --> DASH
FACT_SALES --> API
| Type | Description | Example |
|---|---|---|
| Direct Copy | No transformation | customer_id → customer_id |
| Formatting | Format change | date → YYYY-MM-DD |
| Concatenation | Combine fields | first + last → full_name |
| Lookup | Reference data | code → description |
| Calculation | Derived value | qty * price → amount |
| Aggregation | Summarize | SUM(amount) → total |
| Conditional | Business logic | CASE statements |
| Type Conversion | Data type change | VARCHAR → INT |
# Transformation: Calculate Order Total
## Description
Calculates the extended amount for each order line item.
## Input Columns
| Column | Source | Type |
|--------|--------|------|
| quantity | stg_order | INT |
| unit_price | stg_product | DECIMAL(10,2) |
| discount_pct | stg_order | DECIMAL(5,2) |
## Output Column
| Column | Target | Type |
|--------|--------|------|
| extended_amount | fact_sales | DECIMAL(18,2) |
## Logic
```sql
(quantity * unit_price) * (1 - COALESCE(discount_pct, 0) / 100)
# Impact Analysis: customers.email
## Direct Dependents
| Object | Type | Relationship |
|--------|------|--------------|
| stg_customer.email | Staging Table | Source |
| dim_customer.email | Dimension | Target |
| marketing_list.email | Report | Display |
## Indirect Dependents
| Object | Type | Impact Path |
|--------|------|-------------|
| email_campaign_report | Report | dim_customer → email_metrics → report |
| customer_dashboard | Dashboard | dim_customer → customer_360 → dashboard |
## Change Impact Assessment
- **Schema Change**: Would break ETL pipeline
- **Data Quality Issue**: Affects all downstream reports
- **Removal**: Requires updates to 12 dependent objects
# Backward Lineage: fact_sales.total_revenue
## Immediate Sources
| Source | Transformation |
|--------|----------------|
| stg_order.quantity | Direct |
| stg_product.unit_price | Direct |
| stg_order.discount_pct | Direct |
## Origin Sources
| System | Table | Column |
|--------|-------|--------|
| ERP | orders | qty |
| ERP | order_lines | unit_price |
| CRM | promotions | discount |
## Full Lineage Path
ERP.orders.qty → stg_order.quantity → (calculation) → fact_sales.total_revenue
ERP.order_lines.unit_price → stg_product.unit_price → (calculation) → fact_sales.total_revenue
CRM.promotions.discount → stg_order.discount_pct → (calculation) → fact_sales.total_revenue
-- Lineage storage schema (PascalCase - SQL Server Convention)
CREATE TABLE LineageDataset (
DatasetId INT IDENTITY PRIMARY KEY,
DatasetName VARCHAR(200) NOT NULL,
DatasetType VARCHAR(50), -- Table, View, File, API
SystemName VARCHAR(100),
SchemaName VARCHAR(100),
Description VARCHAR(500),
CreatedAt DATETIME2 DEFAULT GETUTCDATE()
);
CREATE TABLE LineageColumn (
ColumnId INT IDENTITY PRIMARY KEY,
DatasetId INT FOREIGN KEY REFERENCES LineageDataset(DatasetId),
ColumnName VARCHAR(200) NOT NULL,
DataType VARCHAR(50),
IsKey BIT,
IsSensitive BIT,
Description VARCHAR(500)
);
CREATE TABLE LineageMapping (
MappingId INT IDENTITY PRIMARY KEY,
SourceColumnId INT FOREIGN KEY REFERENCES LineageColumn(ColumnId),
TargetColumnId INT FOREIGN KEY REFERENCES LineageColumn(ColumnId),
TransformationType VARCHAR(50),
TransformationLogic VARCHAR(2000),
PipelineName VARCHAR(200),
CreatedAt DATETIME2 DEFAULT GETUTCDATE()
);
CREATE TABLE LineagePipeline (
PipelineId INT IDENTITY PRIMARY KEY,
PipelineName VARCHAR(200) NOT NULL,
PipelineType VARCHAR(50), -- ETL, CDC, Streaming
Schedule VARCHAR(100),
Owner VARCHAR(100),
DocumentationUrl VARCHAR(500)
);
public class LineageTracker
{
private readonly ILineageRepository _repository;
public async Task TrackTransformation(
string pipelineName,
IEnumerable<ColumnMapping> mappings,
CancellationToken ct)
{
foreach (var mapping in mappings)
{
var lineage = new LineageMapping
{
SourceDataset = mapping.Source.Dataset,
SourceColumn = mapping.Source.Column,
TargetDataset = mapping.Target.Dataset,
TargetColumn = mapping.Target.Column,
TransformationType = mapping.TransformationType,
TransformationLogic = mapping.Logic,
PipelineName = pipelineName,
ExecutedAt = DateTime.UtcNow
};
await _repository.AddAsync(lineage, ct);
}
}
public async Task<IEnumerable<LineageMapping>> GetForwardLineage(
string dataset,
string column,
int depth = 3,
CancellationToken ct = default)
{
var results = new List<LineageMapping>();
var queue = new Queue<(string Dataset, string Column, int Level)>();
queue.Enqueue((dataset, column, 0));
while (queue.Count > 0)
{
var (ds, col, level) = queue.Dequeue();
if (level >= depth) continue;
var mappings = await _repository.GetMappingsBySource(ds, col, ct);
results.AddRange(mappings);
foreach (var m in mappings)
{
queue.Enqueue((m.TargetDataset, m.TargetColumn, level + 1));
}
}
return results;
}
}
graph LR
subgraph "Source Layer"
A[crm.customers] --> B[email]
A --> C[name]
end
subgraph "Staging Layer"
D[stg.customer] --> E[email]
D --> F[full_name]
end
subgraph "DWH Layer"
G[dim.customer] --> H[email]
G --> I[customer_name]
end
B -->|LOWER, TRIM| E
C -->|CONCAT| F
E -->|Direct| H
F -->|Direct| I
style B fill:#f9f,stroke:#333
style E fill:#bbf,stroke:#333
style H fill:#bfb,stroke:#333
Inputs from:
er-modeling skill → Table relationshipsmigration-planning skill → ETL specificationsschema-design skill → Physical schemasOutputs to:
data-governance skill → Data catalogThis skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.