Skill

source-analyzer

Install
1
Install the plugin
$
npx claudepluginhub majesticlabs-dev/majestic-marketplace --plugin majestic-data

Want just this skill?

Add to a custom plugin, then install with one command.

Description

Analyze data source characteristics including update frequency, volume patterns, and schema stability.

Tool Access

This skill is limited to using the following tools:

Read Grep Glob Bash
Skill Content

Source Analyzer

Audience: Data engineers planning ETL pipelines.

Goal: Characterize data sources for extraction planning by analyzing volume, update patterns, schema, and quality baselines.

Analysis Dimensions

Volume Analysis

  • Current row count
  • Historical growth rate (if multiple snapshots available)
  • Projected growth
  • Peak vs average volume
  • Seasonality patterns

Update Patterns

  • Full refresh vs incremental
  • Update frequency (hourly, daily, weekly)
  • Batch arrival times
  • Late-arriving data windows

Schema Characteristics

  • Column count and types
  • Nullable patterns
  • Primary key candidates
  • Foreign key relationships
  • Schema change history

Data Quality Baseline

  • Typical null rates per column
  • Expected value ranges
  • Cardinality expectations
  • Known data issues

Workflow

  1. Inventory the source - File location/database connection, access patterns (API, file drop, DB query), authentication requirements
  2. Sample multiple time periods - Analyze data from different dates, identify temporal patterns
  3. Profile the schema - Document all columns, identify stable vs volatile columns, note computed/derived columns
  4. Assess quality - Baseline quality metrics, known quirks, required transformations
  5. Document extraction requirements - Optimal extraction method, incremental key columns, filtering criteria

Output: Source Specification

source:
  name: customer_orders
  type: postgresql
  connection: orders_db

  extraction:
    method: incremental
    key_column: updated_at
    batch_size: 100000
    frequency: hourly

  schema:
    columns:
      - name: order_id
        type: bigint
        nullable: false
        primary_key: true
      - name: customer_id
        type: bigint
        nullable: false
        foreign_key: customers.id
      - name: order_date
        type: date
        nullable: false
      - name: total_amount
        type: decimal(10,2)
        nullable: false
      - name: status
        type: varchar(20)
        nullable: false
        values: [pending, confirmed, shipped, delivered, cancelled]
      - name: updated_at
        type: timestamp
        nullable: false
        incremental_key: true

  volume:
    current_rows: 5_200_000
    daily_growth: 15_000
    peak_hours: [10, 14, 18]

  quality:
    known_issues:
      - "status can be null for orders before 2023"
      - "total_amount occasionally negative (refunds)"
    null_rates:
      customer_id: 0%
      order_date: 0%
      status: 0.5%

  recommendations:
    - "Use updated_at for incremental loads"
    - "Add check constraint for status values"
    - "Consider partitioning by order_date"

Comparison Analysis

When analyzing multiple related sources:

AttributeOrdersOrder_Items
Row count5.2M18.7M
Daily growth15K52K
Key columnorder_iditem_id
Join keyorder_idorder_id
Update lag< 1 hour< 1 hour

Relationship: 1:N (avg 3.6 items per order) Join strategy: Hash join on order_id Load order: Orders first, then Order_Items

API Source Analysis

For API-based sources:

api_source:
  name: stripe_payments
  base_url: https://api.stripe.com/v1
  auth: bearer_token

  endpoints:
    - path: /charges
      method: GET
      pagination: cursor
      rate_limit: 100/sec
      params:
        created[gte]: "{last_sync}"

  extraction:
    strategy: cursor_pagination
    page_size: 100
    sync_frequency: 15min
    full_refresh_weekly: true

  data_characteristics:
    avg_response_size: 50KB
    records_per_page: 100
    typical_daily_volume: 5000

  error_handling:
    retry_codes: [429, 500, 502, 503]
    max_retries: 3
    backoff: exponential
Stats
Stars30
Forks6
Last CommitMar 15, 2026
Actions

Similar Skills