Analyze data source characteristics including update frequency, volume patterns, and schema stability.
Autonomous agent that analyzes data sources for ETL planning. Characterizes volume patterns, update frequency, schema stability, and data quality to generate extraction specifications.
/plugin marketplace add majesticlabs-dev/majestic-marketplace/plugin install majestic-data@majestic-marketplaceAutonomous agent that characterizes data sources for ETL planning.
Inventory the source
Sample multiple time periods
Profile the schema
Assess quality
Document extraction requirements
source:
name: customer_orders
type: postgresql
connection: orders_db
extraction:
method: incremental
key_column: updated_at
batch_size: 100000
frequency: hourly
schema:
columns:
- name: order_id
type: bigint
nullable: false
primary_key: true
- name: customer_id
type: bigint
nullable: false
foreign_key: customers.id
- name: order_date
type: date
nullable: false
- name: total_amount
type: decimal(10,2)
nullable: false
- name: status
type: varchar(20)
nullable: false
values: [pending, confirmed, shipped, delivered, cancelled]
- name: updated_at
type: timestamp
nullable: false
incremental_key: true
volume:
current_rows: 5_200_000
daily_growth: 15_000
peak_hours: [10, 14, 18]
quality:
known_issues:
- "status can be null for orders before 2023"
- "total_amount occasionally negative (refunds)"
null_rates:
customer_id: 0%
order_date: 0%
status: 0.5%
recommendations:
- "Use updated_at for incremental loads"
- "Add check constraint for status values"
- "Consider partitioning by order_date"
When analyzing multiple related sources:
## Source Comparison: Orders vs Order_Items
| Attribute | Orders | Order_Items |
|-----------|--------|-------------|
| Row count | 5.2M | 18.7M |
| Daily growth | 15K | 52K |
| Key column | order_id | item_id |
| Join key | order_id | order_id |
| Update lag | < 1 hour | < 1 hour |
**Relationship:** 1:N (avg 3.6 items per order)
**Join strategy:** Hash join on order_id
**Load order:** Orders first, then Order_Items
For API-based sources:
api_source:
name: stripe_payments
base_url: https://api.stripe.com/v1
auth: bearer_token
endpoints:
- path: /charges
method: GET
pagination: cursor
rate_limit: 100/sec
params:
created[gte]: "{last_sync}"
extraction:
strategy: cursor_pagination
page_size: 100
sync_frequency: 15min
full_refresh_weekly: true
data_characteristics:
avg_response_size: 50KB
records_per_page: 100
typical_daily_volume: 5000
error_handling:
retry_codes: [429, 500, 502, 503]
max_retries: 3
backoff: exponential
Designs feature architectures by analyzing existing codebase patterns and conventions, then providing comprehensive implementation blueprints with specific files to create/modify, component designs, data flows, and build sequences