From Claude-Data-Wrangler
Analyse the user's dataset (structure, volume, relationships, query patterns, access latency needs) and recommend the most suitable database system — relational (Postgres, MySQL, SQLite), analytical (DuckDB, ClickHouse, BigQuery), document (MongoDB), key-value (Redis, DynamoDB), graph (Neo4j), vector (Pinecone, pgvector, Qdrant, Weaviate), or time-series (InfluxDB, TimescaleDB). Produces a ranked recommendation with rationale. Use when the user is choosing where to store a new dataset.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin Claude-Data-WranglerThis skill uses the workspace's default tool permissions.
Recommend a database backend based on dataset shape, volume, and intended use.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Share bugs, ideas, or general feedback.
Recommend a database backend based on dataset shape, volume, and intended use.
| Category | Options | Fits when |
|---|---|---|
| Relational (OLTP) | PostgreSQL, MySQL/MariaDB, SQLite | Structured, normalised, transactional; <100GB hot; SQL is the query language |
| Analytical (OLAP) | DuckDB, ClickHouse, BigQuery, Snowflake, Redshift | Wide tables, heavy aggregations, append-mostly, columnar compression wins |
| Document | MongoDB, Couchbase, Firestore | Heterogeneous nested documents, schema-on-read, developer ergonomics |
| Key-value / wide-column | Redis, DynamoDB, Cassandra | Known-key lookups, very high throughput, simple shape |
| Graph | Neo4j, ArangoDB, Memgraph, Postgres + Apache AGE | Relationship traversal is the dominant query (k-hop, shortest path) |
| Vector | pgvector, Qdrant, Pinecone, Weaviate, Milvus | Similarity / semantic search on embeddings |
| Time-series | TimescaleDB, InfluxDB, QuestDB | High-cardinality timestamped metrics / events, downsampling, retention |
| Search | Elasticsearch, OpenSearch, Meilisearch, Typesense | Full-text search, faceting, analyzer pipelines |
| Hybrid starter | PostgreSQL + extensions (pgvector, PostGIS, TimescaleDB, AGE) | "One DB to rule them all" for small teams; defer specialisation |
sql-load (for relational), vector-upsert (for vector), graph-database (for graph), and note analogous steps for other backends.database_recommendation.md alongside the dataset.Standard library; pandas / pyarrow for profiling.