Skill

connect-data

Guides connection of CSV files, DuckDB, MotherDuck, PostgreSQL, BigQuery, and Snowflake datasets with credential setup, validation, schema profiling, and knowledge brain integration.

PostgreSQL

Python

database

data-engineering

npx claudepluginhub ai-analyst-lab/ai-analyst-plugin --plugin ai-analyst

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Guided wizard to connect a new dataset. Walks the user through selecting

Supporting Assets

references/connection-guide.md

SKILL.md

Similar Skills

data-context

Interviews users to extract tribal knowledge about datasets/databases, generating reusable data context skills for documentation and analysis.

workflows

switch-dataset

Switches active dataset: validates target in data_sources.yaml with fuzzy matching, checks manifest.yaml, updates active.yaml pointer and history, confirms with summary of tables, date range, connection. Triggered by /switch-dataset or phrases like 'switch to dataset'.

ai-analyst

datahub-connector-planning

Plans DataHub connectors by classifying source systems, researching via agent or inline, and generating _PLANNING.md blueprints with entity mappings and architecture decisions. For new connector design or source research.

14 files4 tools

datahub-skills

Stats

Stars11

Forks3

Last CommitMar 18, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Skill: Connect Data

Purpose

Guided wizard to connect a new dataset. Walks the user through selecting a connection type, configuring credentials, validating the connection, profiling the schema, and setting up the knowledge brain.

When to Use

User says /connect-data or "connect my database" or "add a new dataset"
First-run welcome suggests connecting data
After /switch-dataset when the target dataset doesn't exist yet

Invocation

/connect-data — start the connection wizard /connect-data type=postgres — skip type selection

Instructions

Step 1: Choose Connection Type

Present options:

CSV files — "I have CSV files in a local directory"
DuckDB — "I have a local DuckDB database file"
MotherDuck — "I have a MotherDuck cloud database"
PostgreSQL — "I have a PostgreSQL database"
BigQuery — "I have a Google BigQuery dataset"
Snowflake — "I have a Snowflake warehouse"

Step 2: Collect Connection Details

For CSV:

Ask: "What's the path to your CSV directory? (relative to this repo)"
Verify the directory exists and contains .csv files
List found files and ask to confirm

For DuckDB:

Ask: "Path to your .duckdb file?"
Verify file exists
Test connection with SELECT 1

For MotherDuck:

Ask: "Database name and schema?"
Note: "MotherDuck connects via MCP. Make sure your token is configured."

For PostgreSQL / BigQuery / Snowflake:

Copy the appropriate template from connection_templates/
Ask user to fill in required fields
IMPORTANT: Never ask for or store passwords directly. Guide the user to use environment variables (e.g., $PG_PASSWORD).

Step 3: Create Dataset Brain

Generate a dataset_id from the display name (lowercase, hyphens)
Create <workspace>/.knowledge/datasets/{id}/ directory
Write manifest.yaml from the connection template + user inputs
Create empty quirks.md with section headers
Create empty metrics/index.yaml

Step 4: Test Connection

Use ConnectionManager from helpers/connection_manager.py:

Instantiate with the new config
Call test_connection()
If fails: show error, offer to retry or edit config
If passes: proceed

Step 5: Profile Schema

Call list_tables() to enumerate tables
For each table: get column names and types via get_table_schema()
Generate schema.md using schema_to_markdown() from helpers/data_helpers.py
Write to <workspace>/.knowledge/datasets/{id}/schema.md
Offer to run full data profiling: "Want me to deep-profile this dataset?"

Step 6: Set Active

Update <workspace>/.knowledge/active.yaml to point to the new dataset
Confirm: "Connected! {display_name} is now your active dataset."
Show: table count, estimated row count, date range (if detected)
Suggest next steps: /explore to browse, /metrics to define metrics, or just ask a question

Rules

Never store credentials in plain text in manifest files
Always test the connection before declaring success
Always generate a schema.md — it's required for analysis
Create the full <workspace>/.knowledge/datasets/{id}/ tree even if profiling fails
If the user already has this dataset, ask before overwriting

Edge Cases

Directory doesn't exist: Offer to create it
No CSV files found: Check for other formats (.parquet, .json)
Connection fails repeatedly: Suggest checking credentials, firewall, VPN
Schema too large (>100 tables): Profile only, skip per-table details
Dataset name collision: Append a number (e.g., "mydata-2")

Also: Dataset Management (`/datasets`)

When invoked as /datasets, display all connected datasets with their status.

Step 1: Read the source registry

Read <workspace>/data_sources.yaml to get the list of registered sources.

Step 2: Read the active pointer

Read <workspace>/.knowledge/active.yaml to determine which dataset is currently active.

Step 3: Enrich with brain data

For each registered source, check if <workspace>/.knowledge/datasets/{name}/manifest.yaml exists. If it does, read summary stats (table_count, date_range, analysis_count, last_used).

Step 4: Display the list

Connected Datasets:

  * your_dataset (active)
    Your Dataset Name — {table_count} tables, {date_range}
    Connection: {type} ({database})
    Analyses: 0

  - {other_dataset}
    {display_name} — {table_count} tables, {date_range}
    Connection: {type} ({details})
    Analyses: {count}

Commands:
  /switch-dataset {name}  — switch active dataset
  /connect-data           — connect a new dataset
  /data                   — inspect active dataset schema

Mark the active dataset with *. Mark others with -.

Dataset Management Rules

Never show connection credentials — display type and database/schema only, never tokens or passwords
Never show datasets that have no registry entry — orphaned <workspace>/.knowledge/datasets/ dirs without a data_sources.yaml entry should be ignored

Connection Details & Dialects

See references/connection-guide.md for SQL dialect-specific guidance for each connection type.

connect-data

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

connect-data

Tool Access

Preview

Supporting Assets

SKILL.md

Skill: Connect Data

Purpose

When to Use

Invocation

Instructions

Step 1: Choose Connection Type

Step 2: Collect Connection Details

Step 3: Create Dataset Brain

Step 4: Test Connection

Step 5: Profile Schema

Step 6: Set Active

Rules

Edge Cases

Also: Dataset Management (/datasets)

Step 1: Read the source registry

Step 2: Read the active pointer

Step 3: Enrich with brain data

Step 4: Display the list

Dataset Management Rules

Connection Details & Dialects

Similar Skills

Help us improve

Skill: Connect Data

Purpose

When to Use

Invocation

Instructions

Step 1: Choose Connection Type

Step 2: Collect Connection Details

Step 3: Create Dataset Brain

Step 4: Test Connection

Step 5: Profile Schema

Step 6: Set Active

Rules

Edge Cases

Also: Dataset Management (/datasets)

Step 1: Read the source registry

Step 2: Read the active pointer

Step 3: Enrich with brain data

Step 4: Display the list

Dataset Management Rules

Connection Details & Dialects

Also: Dataset Management (`/datasets`)

Also: Dataset Management (`/datasets`)