Use SQLite for structured data analysis when parsing large output (>100 lines), correlating data from multiple sources, tracking state across operations, aggregating results (counts, averages, grouping), or needing to query the same dataset multiple times. Triggers when thinking "I wish I could query this," when about to write custom JSON/CSV/jq parsing code, or when running similar grep/jq commands with slight variations. Not for tiny datasets (<50 records) with a single simple query.
Analyzes structured data using SQLite for complex queries, aggregations, and multi-source correlation instead of writing custom parsing code.
npx claudepluginhub sjungling/sjungling-claude-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Before writing any data analysis code, answer these questions:
If you answered YES to any question above, use SQLite. Don't write custom parsing code.
# Custom code for every query:
cat data.json | jq '.[] | select(.status=="failed")' | jq -r '.error_type' | sort | uniq -c
# SQL does the work:
sqlite3 data.db "SELECT error_type, COUNT(*) FROM errors WHERE status='failed' GROUP BY error_type"
SQLite is just a file - no server, no setup, zero dependencies. Use it when you'd otherwise write custom parsing code or re-process data for each query.
Use when ANY of these apply:
Don't use when ALL of these are true:
→ For tiny datasets with simple access, JSON/grep is fine.
STOP and use SQLite if you're about to:
All of these mean: Load into SQLite once, query with SQL.
| Scenario | Tool | Why |
|---|---|---|
| 50 test results, one-time summary | Python/jq | Fast, appropriate |
| 200+ test results, find flaky tests | SQLite | GROUP BY simpler than code |
| 3 log files, correlate by time | SQLite | JOIN simpler than manual grep |
| Track 1000+ file processing state | SQLite | Queries beat JSON parsing |
Rule of thumb: If you're writing parsing code or re-processing data → use SQLite instead.
# Built-in command
sqlite3 ~/.claude-logs/project.db
# User has sqlite-utils installed (even easier)
sqlite-utils insert data.db table_name data.json --pk=id
sqlite-utils query data.db "SELECT * FROM table"
~/.claude-logs/<project-name>.db # Persists across sessions
# 1. Connect
PROJECT=$(basename $(git rev-parse --show-toplevel 2>/dev/null || pwd))
sqlite3 ~/.claude-logs/$PROJECT.db
# 2. Create table (first time)
CREATE TABLE results (
id INTEGER PRIMARY KEY,
name TEXT,
status TEXT,
duration_ms INTEGER
);
# 3. Load data
INSERT INTO results (name, status, duration_ms)
SELECT json_extract(value, '$.name'),
json_extract(value, '$.status'),
json_extract(value, '$.duration_ms')
FROM json_each(readfile('data.json'));
# 4. Query (SQL does the work)
SELECT status, COUNT(*), AVG(duration_ms)
FROM results
GROUP BY status;
Task: Find error types and their counts from test results.
import json
from collections import defaultdict
# Load and parse
with open('test-results.json') as f:
data = json.load(f)
# Custom aggregation logic
errors_by_type = defaultdict(int)
for record in data:
if record['status'] == 'failed' and record['error_message']:
# Extract error type (custom parsing)
error_type = record['error_message'].split(':')[0]
errors_by_type[error_type] += 1
# Sort and display (more custom code)
sorted_errors = sorted(errors_by_type.items(), key=lambda x: x[1], reverse=True)
for error_type, count in sorted_errors:
print(f"{error_type}: {count}")
# Load once
sqlite3 data.db <<EOF
CREATE TABLE IF NOT EXISTS results (status TEXT, error_message TEXT);
.import --json test-results.json results
EOF
# Query (SQL does aggregation)
sqlite3 data.db "
SELECT substr(error_message, 1, instr(error_message, ':')-1) as error_type,
COUNT(*) as count
FROM results
WHERE status='failed' AND error_message IS NOT NULL
GROUP BY error_type
ORDER BY count DESC
"
With Python you write parsing/aggregation logic. With SQL you declare what you want and query the same data repeatedly.
Without SQLite: Wrote Python code, loaded to memory, custom aggregation logic, data discarded.
With SQLite:
CREATE TABLE test_runs (test_name TEXT, status TEXT, duration_ms INT, run_id TEXT);
-- Load once
INSERT INTO test_runs SELECT ...;
-- Query many times (no re-processing!)
-- Find flaky tests
SELECT test_name,
SUM(CASE WHEN status='pass' THEN 1 ELSE 0 END) as passes,
SUM(CASE WHEN status='fail' THEN 1 ELSE 0 END) as fails
FROM test_runs
GROUP BY test_name
HAVING passes > 0 AND fails > 0;
-- Slowest tests
SELECT test_name, AVG(duration_ms) FROM test_runs GROUP BY test_name ORDER BY AVG(duration_ms) DESC LIMIT 10;
Without SQLite: Used jq/grep/pipes, re-processed for each question, manual timestamp correlation.
With SQLite:
CREATE TABLE errors (timestamp TEXT, service TEXT, error_type TEXT, message TEXT);
-- Load all 3 files once
INSERT INTO errors SELECT ...;
-- Find cross-service failures (JOIN is easier than grep)
SELECT e1.timestamp, e1.service, e2.service
FROM errors e1 JOIN errors e2
ON datetime(e1.timestamp) BETWEEN datetime(e2.timestamp, '-5 minutes') AND datetime(e2.timestamp, '+5 minutes')
WHERE e1.service != e2.service;
-- Error frequency
SELECT error_type, COUNT(*) FROM errors GROUP BY error_type ORDER BY COUNT(*) DESC;
Without SQLite: Wrote custom JSON parsing code, manual state updates, custom query logic.
With SQLite:
CREATE TABLE files (name TEXT PRIMARY KEY, status TEXT, error TEXT, completed_at TEXT);
-- Initialize
INSERT INTO files (name, status) VALUES ('file1.txt', 'pending');
-- Update state
UPDATE files SET status='completed', completed_at=datetime('now') WHERE name='file1.txt';
-- Queries (no custom code!)
SELECT COUNT(*) FROM files WHERE status='completed';
SELECT * FROM files WHERE status='failed';
SELECT * FROM files WHERE status='pending' LIMIT 1; -- next file
Note: For only 50 files with simple queries, JSON was actually fine! Use SQLite when you need complex queries or >100 files.
From: "Process data once and done" To: "Make data queryable"
Benefits:
Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies