This skill should be used when the user asks to "design analytics models", "set up a dbt project", "plan data transformations", "define data contracts", or "model a star schema", or mentions staging models, marts, incremental strategies, or materializations. It produces analytics pipeline designs with dbt-style transformations, data modeling patterns, testing strategies, and documentation plans. [EXPLICIT] Use this skill whenever the user needs source-to-target mapping, materialization decisions, or transformation framework architecture, even if they don't explicitly ask for "analytics engineering". [EXPLICIT]
From jm-adknpx claudepluginhub javimontano/jm-adk-alfaThis skill is limited to using the following tools:
agents/guardian.mdagents/lead.mdagents/specialist.mdagents/support.mdevals/evals.jsonknowledge/body-of-knowledge.mdknowledge/knowledge-graph.mdprompts/meta.mdprompts/primary.mdprompts/variations/deep.mdprompts/variations/quick.mdreferences/analytics-patterns.mdtemplates/output.docx.mdtemplates/output.htmlSearches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Analytics engineering defines how raw data is transformed into reliable, documented, and tested analytical models — source-to-target mapping, modeling patterns, transformation frameworks, testing, and documentation. This skill produces analytics engineering documentation that enables teams to build maintainable, trustworthy data transformation pipelines. [EXPLICIT]
Un modelo analítico sin tests es una opinión con formato de tabla. Los tests son ciudadanos de primera clase — no un afterthought. La documentación es parte del modelo, no un artefacto separado. Incremental sobre full-refresh siempre que el volumen lo justifique. Cada modelo tiene grain explícito, owner identificado, y contract enforced en CI.
The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts. [EXPLICIT]
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 source-to-target + S2 modeling patterns + S4 testing strategy) | técnica (full 6 sections, default)Before generating architecture, detect the project context:
!find . -name "*.sql" -o -name "*.yml" -o -name "dbt_project.yml" -o -name "profiles.yml" -o -name "*.py" | head -30
Use detected tools (dbt, SQLMesh, Dataform, stored procedures, etc.) to tailor recommendations. [EXPLICIT]
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/analytics-patterns.md
Maps the journey from raw sources through staging to consumption-ready marts. [EXPLICIT]
dbt project structure conventions:
models/
staging/ # 1:1 source mappings — stg_{source}_{entity}.sql
salesforce/
stripe/
intermediate/ # Business logic joins — int_{entity}_{verb}.sql
finance/
marketing/
marts/ # Consumption models
finance/ # fct_{event}.sql, dim_{entity}.sql
marketing/
core/ # Shared dimensions — dim_customer, dim_date
Naming conventions (enforce via CI linting):
stg_ — staging: rename, cast, deduplicate; one model per source tableint_ — intermediate: joins, pivots, aggregations that simplify mart logicfct_ — fact: event/transaction grain, numeric measures, foreign keysdim_ — dimension: descriptive attributes, surrogate keys, SCD trackingmrt_ — mart-level aggregations when pre-aggregating for BI performanceIncludes:
Key decisions:
Selects the modeling approach based on query patterns and data characteristics. [EXPLICIT]
Includes:
Key decisions:
Documents tool configuration, model organization, and materialization strategies. [EXPLICIT]
Incremental strategy comparison:
| Strategy | Mechanism | Best For | Watch Out |
|---|---|---|---|
| append | Insert new rows only | Immutable event streams (logs, clicks) | Cannot handle late-arriving updates |
| merge | Upsert on unique_key | Mutable entities (orders, users) | Requires stable unique key; expensive on large tables |
| delete+insert | Delete partition, re-insert | Late-arriving data in known partitions | Partition key must be deterministic |
| insert_overwrite | Overwrite entire partition | Cost-efficient on BigQuery/Hive | Not supported on all warehouses |
| microbatch | Process in time-windowed batches | Very large event tables (1B+ rows) | Newer dbt feature; requires event_time column |
Acceptance criteria for incremental models: test incremental runs against full refresh monthly to catch drift; always define unique_key and updated_at; set on_schema_change: 'append_new_columns' as default. [EXPLICIT]
ref() vs source() conventions:
source() only in staging models — never reference raw tables in intermediate or mart layersref() everywhere else to maintain DAG integrity and enable state-aware buildsIncludes:
Key decisions:
Defines testing strategy and contract enforcement for data reliability. [EXPLICIT]
Testing pyramid for data (invest effort bottom-up):
| Level | What | Tools | Blocks Deploy? |
|---|---|---|---|
| Source freshness | Data arrived on time | dbt source freshness | Warn at 2x SLA, error at 4x |
| Schema tests | not_null, unique, accepted_values, relationships | dbt generic tests in YAML | Yes — mart layer always, staging for critical |
| Custom data tests | Business rule validation, cross-model consistency | dbt singular tests (.sql files) | Yes for mart layer |
| Unit tests | Macro logic, complex SQL transformations | dbt unit tests (v1.8+), SQLMesh audits | Yes — CI blocks merge |
| Contract tests | Column names, types, constraints between teams | dbt model contracts (contract: {enforced: true}) | Yes — breaking changes blocked |
CI/CD for dbt:
dbt build --select state:modified+ on PRs — test only changed models and downstream dependencies--defer --state prod-artifacts/ — reference production tables for unchanged models, avoiding full rebuildsdbt source freshness as pre-build step; skip stale sources with --exclude source:staleIncludes:
Key decisions:
Plans auto-generated and manually enriched documentation for data discovery. [EXPLICIT]
Exposure and metric definitions (connecting to BI):
exposures:
- name: weekly_revenue_dashboard
type: dashboard
maturity: high
url: https://bi-tool.company.com/dashboard/123
depends_on:
- ref('fct_orders')
- ref('dim_customer')
owner:
name: Finance Analytics
email: finance-analytics@company.com
metrics:
- name: monthly_recurring_revenue
label: MRR
type: derived
description: Sum of active subscription revenue, normalized to monthly
calculation_method: derived
expression: "sum(amount) where status = 'active'"
time_grains: [day, week, month]
dimensions: [plan_type, region, customer_segment]
Exposures create accountability: when a model breaks, the owner of every downstream exposure is notified. Define exposures for every L1-L2 dashboard and every ML pipeline consuming marts. [EXPLICIT]
Includes:
Key decisions:
Optimizes warehouse performance and controls transformation costs. [EXPLICIT]
Includes:
Key decisions:
query_tag in profiles.yml) to enable per-model cost tracking| Decision | Enables | Constrains | Threshold |
|---|---|---|---|
| Star Schema | Fast queries, intuitive for BI, clear grain | More joins, ETL complexity | Multiple consumption patterns, 3+ BI consumers |
| One Big Table | No joins, fast development | Redundancy, update complexity | Single-use analytics, <100M rows |
| Incremental Models | Fast builds, cost efficient | Harder debugging, late-arriving data risk | Fact tables >10M rows, frequent builds |
| Full Refresh | Simple, deterministic | Expensive at scale, slow | Dimension tables, prototyping, <10M rows |
| Strict Data Contracts | Reliability, breaking change prevention | Slower iteration | Production-critical marts, multi-team |
| Column-Level Lineage | Precise impact analysis | Tooling cost, maintenance | Regulated industries, 100+ models |
Legacy Stored Procedures Migration: Map existing logic to dbt models, preserve business rules, run parallel validation. Expect 20-30% of stored procedure logic to be obsolete or duplicated. [EXPLICIT]
Multi-Warehouse Environment: Models consumed across Snowflake, BigQuery, and Redshift. Use cross-database macros, abstract warehouse-specific SQL, test on each target platform. [EXPLICIT]
Real-Time Transformation Needs: dbt is batch-oriented. For streaming transformations, consider Materialize, RisingWave, or SQLMesh with streaming support. Hybrid architecture: batch marts enriched by streaming aggregates. [EXPLICIT]
Massive Scale (10B+ Rows): Incremental models mandatory. Microbatch strategy, partition pruning, and clustering are critical. Profile query plans before and after optimization. [EXPLICIT]
Single Analytics Engineer: Skip intermediate layers initially. Start with staging + marts. Add layers as complexity grows. Documentation is critical for bus-factor mitigation. [EXPLICIT]
Before finalizing delivery, verify:
state:modified+, deferred execution)| Format | Default | Description |
|---|---|---|
markdown | Yes | Markdown con Mermaid embebido (DAG, star schema diagrams). |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter. [EXPLICIT]
Primary: A-01_Analytics_Engineering.html — Source-to-target mapping, modeling patterns, transformation framework, testing strategy, documentation plan, performance optimization.
Secondary: Model dependency DAG, naming convention guide, test coverage report template, cost attribution dashboard spec.
Author: Javier Montano | Last updated: March 18, 2026