Master dimensional data modeling including star schema design, slowly changing dimensions, fact tables, and data warehouse architecture
Designs star schemas and implements dimensional data modeling with SCD patterns.
/plugin marketplace add pluginagentmarketplace/custom-plugin-bi-analyst/plugin install developer-roadmap@pluginagentmarketplace-bi-analystThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pyMaster dimensional data modeling for business intelligence, including star schema design, slowly changing dimensions, and modern data warehouse patterns.
DIMENSIONAL MODELING BASICS
1. Identify the business process (e.g., Sales)
2. Declare the grain (one row = one order line)
3. Choose dimensions (Who, What, When, Where)
4. Identify facts (measures: quantity, amount, cost)
┌─────────────┐
│ Dim_Date │
└──────┬──────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
│Dim_Customer │ │ Fact_Sales │ │ Dim_Product │
└─────────────┘ └──────┬──────┘ └─────────────┘
│
┌──────┴──────┐
│ Dim_Store │
└─────────────┘
STAR = One fact table surrounded by dimensions
Dimensions join directly to fact (no chains)
┌──────────────────────────────────────────────────────────┐
│ FACT TABLE TYPES │
├──────────────────────────────────────────────────────────┤
│ │
│ TRANSACTION FACT │
│ • One row per event/transaction │
│ • Most common type │
│ • Example: Order line, click, payment │
│ • Measures: Quantity, Amount, Count │
│ │
│ PERIODIC SNAPSHOT │
│ • One row per time period per entity │
│ • Point-in-time state │
│ • Example: Daily account balance, Monthly inventory │
│ • Measures: Balance, Quantity on Hand │
│ │
│ ACCUMULATING SNAPSHOT │
│ • One row per entity lifecycle │
│ • Multiple date columns (milestones) │
│ • Example: Order fulfillment, Loan processing │
│ • Measures: Days in stage, Count │
│ │
│ FACTLESS FACT │
│ • Records events without measures │
│ • Just dimension keys │
│ • Example: Student attendance, Product promotions │
│ │
└──────────────────────────────────────────────────────────┘
SCD TYPE 0: Retain Original
┌────────────────────────────────────────────────────┐
│ Never update. Keep original value forever. │
│ Use for: Birth date, Original signup source │
└────────────────────────────────────────────────────┘
SCD TYPE 1: Overwrite
┌────────────────────────────────────────────────────┐
│ Update in place. No history preserved. │
│ Use for: Error corrections, Non-critical data │
│ │
│ Before: | ID | Name | Address | │
│ | 1 | John | 123 Main St | │
│ │
│ After: | ID | Name | Address | │
│ | 1 | John | 456 Oak Ave | │
└────────────────────────────────────────────────────┘
SCD TYPE 2: Add New Row
┌────────────────────────────────────────────────────┐
│ Insert new row. Expire old row. Full history. │
│ Use for: Critical tracking (address, segment) │
│ │
│ | Key | ID | Address | Start | End |Curr │
│ | 1 | A | 123 Main | 2020-01 | 2024-06 | N │
│ | 2 | A | 456 Oak | 2024-06 | 9999-12 | Y │
└────────────────────────────────────────────────────┘
SCD TYPE 3: Add Column
┌────────────────────────────────────────────────────┐
│ Keep current and previous value in columns. │
│ Use for: When only one prior value needed │
│ │
│ | ID | Current_Address | Previous_Address | │
│ | 1 | 456 Oak Ave | 123 Main St | │
└────────────────────────────────────────────────────┘
SCD TYPE 6: Hybrid (1+2+3)
┌────────────────────────────────────────────────────┐
│ Combination: History + current column │
│ Use for: Maximum query flexibility │
│ │
│ | Key | ID | Addr | Curr_Addr | Start | End | │
│ | 1 | A | 123 Main| 456 Oak | 2020 | 2024 | │
│ | 2 | A | 456 Oak | 456 Oak | 2024 | 9999 | │
└────────────────────────────────────────────────────┘
CREATE TABLE dim_customer (
-- Surrogate Key (Primary Key)
customer_key INT IDENTITY(1,1) PRIMARY KEY,
-- Natural Key (Business Key)
customer_id VARCHAR(50) NOT NULL,
-- Descriptive Attributes
customer_name VARCHAR(200) NOT NULL,
email VARCHAR(200),
phone VARCHAR(50),
-- Hierarchy Attributes
segment VARCHAR(50),
tier VARCHAR(20),
-- Geographic Hierarchy
address VARCHAR(500),
city VARCHAR(100),
state VARCHAR(50),
postal_code VARCHAR(20),
country VARCHAR(100),
region VARCHAR(50),
-- SCD Type 2 Tracking
effective_start_date DATE NOT NULL,
effective_end_date DATE DEFAULT '9999-12-31',
is_current BIT DEFAULT 1,
version_number INT DEFAULT 1,
-- Audit Columns
created_at DATETIME DEFAULT GETDATE(),
updated_at DATETIME,
source_system VARCHAR(50),
etl_batch_id BIGINT
);
-- Indexes
CREATE UNIQUE INDEX idx_customer_natural ON dim_customer(customer_id, effective_end_date);
CREATE INDEX idx_customer_current ON dim_customer(customer_id) WHERE is_current = 1;
CREATE INDEX idx_customer_segment ON dim_customer(segment);
CREATE TABLE fact_sales (
-- Surrogate Key
sales_key BIGINT IDENTITY(1,1) PRIMARY KEY,
-- Dimension Foreign Keys
date_key INT NOT NULL,
customer_key INT NOT NULL,
product_key INT NOT NULL,
store_key INT NOT NULL,
promotion_key INT,
-- Degenerate Dimensions (no separate table needed)
order_number VARCHAR(50) NOT NULL,
order_line_number INT NOT NULL,
-- Additive Measures (can SUM across all dimensions)
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
discount_amount DECIMAL(10,2) DEFAULT 0,
sales_amount DECIMAL(12,2) NOT NULL,
cost_amount DECIMAL(12,2),
profit_amount DECIMAL(12,2),
tax_amount DECIMAL(10,2),
-- Non-Additive Measures (cannot SUM directly)
unit_cost DECIMAL(10,2),
discount_percent DECIMAL(5,2),
margin_percent DECIMAL(5,2),
-- Audit Columns
created_at DATETIME DEFAULT GETDATE(),
source_system VARCHAR(50),
etl_batch_id BIGINT,
-- Foreign Key Constraints
FOREIGN KEY (date_key) REFERENCES dim_date(date_key),
FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key),
FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
FOREIGN KEY (store_key) REFERENCES dim_store(store_key)
);
-- Indexes for common query patterns
CREATE INDEX idx_sales_date ON fact_sales(date_key);
CREATE INDEX idx_sales_customer ON fact_sales(customer_key);
CREATE INDEX idx_sales_product ON fact_sales(product_key);
CREATE INDEX idx_sales_composite ON fact_sales(date_key, product_key, customer_key);
-- Partitioning (for large tables)
-- CREATE TABLE fact_sales PARTITION BY RANGE (date_key);
-- Generate comprehensive date dimension
WITH date_spine AS (
SELECT
DATEADD(DAY, n, '2020-01-01') AS date_value
FROM (
SELECT TOP 3650 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 AS n
FROM sys.objects a CROSS JOIN sys.objects b
) numbers
)
INSERT INTO dim_date (
date_key,
date_value,
year,
quarter,
month,
month_name,
week,
day_of_week,
day_name,
day_of_month,
day_of_year,
is_weekend,
is_holiday,
fiscal_year,
fiscal_quarter
)
SELECT
CONVERT(INT, FORMAT(date_value, 'yyyyMMdd')) AS date_key,
date_value,
YEAR(date_value) AS year,
DATEPART(QUARTER, date_value) AS quarter,
MONTH(date_value) AS month,
DATENAME(MONTH, date_value) AS month_name,
DATEPART(WEEK, date_value) AS week,
DATEPART(WEEKDAY, date_value) AS day_of_week,
DATENAME(WEEKDAY, date_value) AS day_name,
DAY(date_value) AS day_of_month,
DATEPART(DAYOFYEAR, date_value) AS day_of_year,
CASE WHEN DATEPART(WEEKDAY, date_value) IN (1, 7) THEN 1 ELSE 0 END AS is_weekend,
0 AS is_holiday, -- Update with actual holidays
CASE WHEN MONTH(date_value) >= 4 THEN YEAR(date_value) ELSE YEAR(date_value) - 1 END AS fiscal_year,
CASE
WHEN MONTH(date_value) IN (4,5,6) THEN 1
WHEN MONTH(date_value) IN (7,8,9) THEN 2
WHEN MONTH(date_value) IN (10,11,12) THEN 3
ELSE 4
END AS fiscal_quarter
FROM date_spine;
tables:
dimensions: dim_{entity}
facts: fact_{business_process}
bridges: bridge_{relationship}
aggregates: agg_{fact}_{grain}
columns:
surrogate_key: "{table}_key"
natural_key: "{entity}_id"
foreign_key: "{dimension}_key"
measures: "{name}_amount", "{name}_count", "{name}_qty"
dates: "{name}_date", "created_at", "updated_at"
flags: "is_{condition}", "has_{feature}"
examples:
- dim_customer.customer_key
- dim_customer.customer_id
- fact_sales.customer_key
- fact_sales.sales_amount
- dim_date.is_weekend
## Fact Table: fact_order_line
### Grain Statement
One row represents one line item on one customer order.
### Grain Components
- One order line item
- On one order
- For one product
- Sold to one customer
- At one store location
- On one specific date
### Test Query
SELECT order_number, line_number, COUNT(*)
FROM fact_order_line
GROUP BY order_number, line_number
HAVING COUNT(*) > 1;
-- Should return 0 rows
Conformed = Same dimension used across multiple facts
BENEFITS:
• Consistent reporting
• Drill-across queries
• Single version of truth
REQUIREMENTS:
• Same surrogate keys
• Same attributes
• Same hierarchies
• Same business rules
EXAMPLES:
• dim_date: Used by sales, inventory, finance
• dim_customer: Used by sales, support, marketing
• dim_product: Used by sales, inventory, returns
-- Customer can have multiple accounts
-- Account can have multiple customers
CREATE TABLE bridge_customer_account (
bridge_key INT IDENTITY PRIMARY KEY,
customer_key INT NOT NULL,
account_key INT NOT NULL,
relationship_type VARCHAR(50), -- Primary, Secondary, Authorized
weight_factor DECIMAL(5,4), -- For allocation (sum to 1.0)
effective_start DATE,
effective_end DATE,
FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key),
FOREIGN KEY (account_key) REFERENCES dim_account(account_key)
);
-- Usage in query
SELECT
c.customer_name,
SUM(f.balance * b.weight_factor) AS allocated_balance
FROM fact_account f
JOIN bridge_customer_account b ON f.account_key = b.account_key
JOIN dim_customer c ON b.customer_key = c.customer_key
WHERE c.customer_id = 'CUST001'
GROUP BY c.customer_name;
-- Pre-aggregate for performance
CREATE TABLE agg_sales_monthly_product (
date_key_month INT NOT NULL, -- Monthly grain
product_key INT NOT NULL,
total_quantity INT,
total_sales_amount DECIMAL(15,2),
total_cost_amount DECIMAL(15,2),
order_count INT,
customer_count INT,
row_count INT, -- For drill-through verification
PRIMARY KEY (date_key_month, product_key)
);
-- Refresh aggregate
INSERT INTO agg_sales_monthly_product
SELECT
(YEAR(d.date_value) * 100 + MONTH(d.date_value)) AS date_key_month,
f.product_key,
SUM(f.quantity),
SUM(f.sales_amount),
SUM(f.cost_amount),
COUNT(DISTINCT f.order_number),
COUNT(DISTINCT f.customer_key),
COUNT(*)
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
GROUP BY (YEAR(d.date_value) * 100 + MONTH(d.date_value)), f.product_key;
const executeETL = async (job: ETLJob) => {
const retryConfig = {
maxRetries: 3,
backoffMs: [60000, 180000, 300000] // 1min, 3min, 5min
};
for (let attempt = 0; attempt <= retryConfig.maxRetries; attempt++) {
try {
return await dataWarehouse.runJob(job);
} catch (error) {
if (attempt === retryConfig.maxRetries) throw error;
if (error.code === 'DEADLOCK') {
await sleep(retryConfig.backoffMs[attempt]);
continue;
}
throw error;
}
}
};
const dwHooks = {
onDimensionLoad: (dimName, rowCount, duration) => {
console.log(`[DW] Loaded ${dimName}: ${rowCount} rows in ${duration}s`);
},
onFactLoad: (factName, rowCount, duration) => {
console.log(`[DW] Loaded ${factName}: ${rowCount} rows in ${duration}s`);
},
onSCDProcess: (dimName, inserts, updates) => {
console.log(`[DW] SCD on ${dimName}: ${inserts} inserts, ${updates} updates`);
}
};
describe('Data Warehousing Skill', () => {
describe('Grain Validation', () => {
it('should have unique grain in fact table', async () => {
const duplicates = await db.query(`
SELECT order_number, line_number, COUNT(*)
FROM fact_sales
GROUP BY order_number, line_number
HAVING COUNT(*) > 1
`);
expect(duplicates.rowCount).toBe(0);
});
});
describe('Referential Integrity', () => {
it('should have no orphan facts', async () => {
const orphans = await db.query(`
SELECT COUNT(*)
FROM fact_sales f
LEFT JOIN dim_customer c ON f.customer_key = c.customer_key
WHERE c.customer_key IS NULL
`);
expect(orphans.rows[0].count).toBe(0);
});
});
describe('SCD Type 2', () => {
it('should have only one current record per entity', async () => {
const duplicateCurrent = await db.query(`
SELECT customer_id, COUNT(*)
FROM dim_customer
WHERE is_current = 1
GROUP BY customer_id
HAVING COUNT(*) > 1
`);
expect(duplicateCurrent.rowCount).toBe(0);
});
});
});
| Issue | Cause | Solution |
|---|---|---|
| Duplicate facts | Grain violation | Review ETL, add unique constraint |
| Missing dimension rows | Late-arriving data | Add "Unknown" row, fix SCD logic |
| Slow queries | Missing indexes | Add indexes on FK and filter columns |
| Wrong totals | Measure additivity | Check semi/non-additive handling |
| History gaps | SCD implementation | Verify date continuity |
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2024-01 | Initial release |
| 2.0.0 | 2025-01 | Production-grade with SCD patterns |
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.