Analytics Engineering
Design and implement analytics engineering practices: dbt-style SQL transformations, dimensional data modeling, testing strategies, and automated documentation for data warehouses and lakehouses.
Guiding Principle
"Analytics engineering applies software engineering discipline to data transformation — version control, testing, documentation, and code review are non-negotiable."
Procedure
Step 1 — Data Modeling Design
- Identify business processes and grain for each fact table
- Design dimensional model: facts, dimensions, bridges, and degenerate dimensions
- Apply naming conventions:
fct_, dim_, stg_, int_ prefixes
- Define slowly changing dimension strategy (SCD Type 1, 2, or 3) per dimension
- Produce an ER diagram with grain, cardinality, and key relationships annotated
Step 2 — Transformation Layer Architecture
- Design staging models: 1:1 with source, renaming, typing, basic cleaning
- Design intermediate models: business logic, joins, calculations
- Design mart models: final consumer-facing tables optimized for query patterns
- Implement incremental models where data volume warrants it
- Define materialization strategy per model: view, table, incremental, ephemeral
Step 3 — Testing & Quality
- Implement schema tests: not_null, unique, accepted_values, relationships
- Design custom data tests for business logic validation
- Implement freshness checks on source tables
- Define test severity levels: error (blocks pipeline) vs. warn (alerts only)
- Build test coverage metrics targeting >80% of critical columns
Step 4 — Documentation & Lineage
- Write model descriptions and column-level documentation
- Generate and publish dbt docs site for data consumers
- Implement column-level lineage through model references
- Create a data dictionary with business definitions for key metrics
- Establish a change management process for model modifications
Quality Criteria
- Every model has a description and column-level documentation for key fields
- Schema tests cover all primary keys (unique + not_null) and foreign keys (relationships)
- Incremental models handle late-arriving data correctly
- Data dictionary defines all business metrics with calculation logic
Anti-Patterns
- Models with hundreds of lines of SQL and no intermediate abstractions
- Testing only at the mart layer, missing quality issues in staging
- Materializing everything as tables when views would suffice
- Undocumented models that only the author understands