From claude-mods
Analyzes log files especially JSONL for structured extraction, cross-log correlation, timeline reconstruction, and pattern search using ripgrep and jq.
npx claudepluginhub 0xdarkmatter/claude-modsThis skill is limited to using the following tools:
Practical patterns for analyzing log files -- especially JSONL format used in agent conversation logs, benchmark outputs, and structured application logs.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Practical patterns for analyzing log files -- especially JSONL format used in agent conversation logs, benchmark outputs, and structured application logs.
Unknown Log File
│
├─ Is it one JSON object per line?
│ ├─ Yes ──────────────────────── JSONL
│ │ ├─ Small file (<100MB)
│ │ │ └─ jq for extraction, jq -s for aggregation
│ │ ├─ Large file (100MB-1GB)
│ │ │ └─ rg prefilter then pipe to jq
│ │ └─ Huge file (>1GB)
│ │ └─ split + parallel jq, or jq --stream
│ │
│ └─ No
│ ├─ Is it one large JSON object/array?
│ │ └─ Yes ──────────────── Single JSON
│ │ └─ jq --stream for SAX-style, or jq directly if fits in memory
│ │
│ ├─ Does it have key=value pairs?
│ │ └─ Yes ──────────────── Structured (logfmt / key-value)
│ │ └─ rg for search, awk/sd for extraction, angle-grinder for aggregation
│ │
│ ├─ Does it follow syslog format? (timestamp hostname service[pid]: message)
│ │ └─ Yes ──────────────── Syslog
│ │ └─ rg for search, awk for column extraction, lnav for interactive
│ │
│ ├─ Is it space/tab delimited with consistent columns?
│ │ └─ Yes ──────────────── Column-based (access logs, CSV)
│ │ └─ awk for extraction, mlr for CSV, rg for pattern search
│ │
│ └─ Mixed or unstructured
│ └─ Plain text ─────────── Freeform
│ └─ rg for search, rg -A/-B for context, lnav for exploration
Required (must be installed):
rg (ripgrep) - text search, prefiltering. Install: cargo install ripgrep / choco install ripgrepjq - JSON/JSONL extraction and transformation. Install: brew install jq / choco install jqOptional (enhanced capabilities, gracefully degraded without):
lnav - interactive log exploration with SQL queries. Install: brew install lnav / WSL: apt install lnavagrind (angle-grinder) - pipeline aggregation syntax. Install: cargo install agmlr (Miller) - CSV/TSV log analysis. Install: brew install miller / choco install millerGNU parallel - parallel processing of split files. Install: brew install parallelAll patterns in this skill work with just rg + jq. Optional tools add interactive exploration (lnav), pipeline aggregation (agrind), and tabular analysis (mlr).
| Tool | Best For | Speed | Required? |
|---|---|---|---|
rg (ripgrep) | Raw pattern matching in any format | Fastest | Yes |
jq | JSONL structured extraction and transformation | Fast | Yes |
jq -s | JSONL aggregation (slurp all lines into array) | Medium (loads all into memory) | Yes (part of jq) |
lnav | Interactive exploration, SQL over logs | Interactive | Optional |
agrind (angle-grinder) | Pipeline aggregation and counting | Fast | Optional |
awk | Column-based log formats, field extraction | Fast | Pre-installed |
mlr (Miller) | CSV/TSV log analysis, statistics | Fast | Optional |
fd + rg | Searching across many log directories | Fast | Pre-installed in dev-shell |
GNU parallel | Splitting large files for parallel processing | N/A (orchestrator) | Optional |
Need to...
│
├─ Find lines matching a pattern
│ └─ rg (always fastest for text search)
│
├─ Extract specific fields from JSONL
│ └─ jq -r '[.field1, .field2] | @tsv'
│
├─ Count/aggregate over JSONL
│ └─ jq -sc 'group_by(.field) | map({key: .[0].field, n: length})'
│
├─ Search JSONL by value then format results
│ └─ rg '"error"' file.jsonl | jq -r '.message' (two-stage)
│
├─ Explore interactively with filtering/SQL
│ └─ lnav file.log
│
├─ Aggregate with pipeline syntax
│ └─ agrind '* | parse "* * *" as ts, level, msg | count by level'
│
├─ Extract columns from space-delimited logs
│ └─ awk '{print $1, $4, $7}' access.log
│
└─ Process CSV/TSV logs with headers
└─ mlr --csv filter '$status >= 400' then stats1 -a count -f status
The most common format for structured logs. One JSON object per line, no trailing commas, no wrapping array.
# Filter by field value
jq -c 'select(.level == "error")' app.jsonl
# Filter by nested field
jq -c 'select(.request.method == "POST")' app.jsonl
# Filter by multiple conditions
jq -c 'select(.level == "error" and .status >= 500)' app.jsonl
# Filter by array contains
jq -c 'select(.tags | index("critical"))' app.jsonl
# Filter by field existence
jq -c 'select(.stack_trace != null)' app.jsonl
# Negate a filter
jq -c 'select(.level != "debug")' app.jsonl
# Extract single field
jq -r '.message' app.jsonl
# Extract multiple fields as TSV
jq -r '[.timestamp, .level, .message] | @tsv' app.jsonl
# Extract with default for missing fields
jq -r '.error_code // "none"' app.jsonl
# Extract nested field safely
jq -r '.response.headers["content-type"] // "unknown"' app.jsonl
# Count by field value
jq -sc 'group_by(.level) | map({level: .[0].level, count: length})' app.jsonl
# Top-N most common values
jq -sc '[.[].error_type] | group_by(.) | map({type: .[0], count: length}) | sort_by(-.count) | .[:10]' app.jsonl
# Sum a numeric field
jq -sc 'map(.duration_ms) | add' app.jsonl
# Average
jq -sc 'map(.duration_ms) | add / length' app.jsonl
# Min and max
jq -sc 'map(.duration_ms) | {min: min, max: max}' app.jsonl
# Extract tool calls from conversation logs
jq -c '.content[]? | select(.type == "tool_use") | .name' conversation.jsonl
# De-escape nested JSON strings
jq -c '.content | fromjson' app.jsonl
# Flatten nested arrays
jq -c '[.events[]? | .action]' app.jsonl
# Extract from arrays of objects
jq -c '.results[]? | select(.passed == false) | {test: .name, error: .message}' results.jsonl
# Fast prefilter then structured extraction
rg '"error"' app.jsonl | jq -r '[.timestamp, .message] | @tsv'
# Search for specific value then aggregate
rg '"timeout"' app.jsonl | jq -sc 'length'
# Pattern match then extract
rg '"user_id":"u-123"' app.jsonl | jq -c '{ts: .timestamp, action: .action}'
# Filter by timestamp range (ISO 8601 string comparison works)
jq -c 'select(.timestamp > "2026-03-08T10:00" and .timestamp < "2026-03-08T11:00")' app.jsonl
# Events in the last N minutes (using epoch seconds)
jq -c --arg cutoff "$(date -d '30 minutes ago' +%s)" 'select((.timestamp | sub("\\.[0-9]+Z$"; "Z") | fromdate) > ($cutoff | tonumber))' app.jsonl
# Extract hour for histogram
jq -r '.timestamp | split("T")[1] | split(":")[0]' app.jsonl | sort | uniq -c
# Extract IDs from one file, search in another
jq -r '.request_id' errors.jsonl | while read id; do
rg "\"$id\"" responses.jsonl | jq -c '{id: .request_id, status: .status}'
done
# Faster: build lookup, then join
jq -r '.request_id' errors.jsonl | sort -u > /tmp/error_ids.txt
rg -Ff /tmp/error_ids.txt responses.jsonl | jq -c '{id: .request_id, status: .status}'
# Join two JSONL files by key using jq --slurpfile
jq --slurpfile lookup <(jq -sc 'map({(.id): .}) | add' lookup.jsonl) \
'. + ($lookup[0][.ref_id] // {})' main.jsonl
# Show 5 lines before and after each match
rg -B5 -A5 "OutOfMemoryError" app.log
# Show only matching files
rg -l "FATAL" /var/log/
# Count matches per file
rg -c "ERROR" /var/log/*.log | sort -t: -k2 -rn
# Multiline patterns (stack traces)
rg -U "Exception.*\n(\s+at .*\n)+" app.log
# Apache/nginx access log: extract status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn
# Extract specific time range from syslog
awk '$0 >= "Mar 8 10:00" && $0 <= "Mar 8 11:00"' syslog
# Calculate average response time (column 11)
awk '{sum += $11; n++} END {print sum/n}' access.log
# Filter by status code and show URL + response time
awk '$9 >= 500 {print $7, $11"ms"}' access.log
# Follow with filtering
tail -f app.log | rg --line-buffered "ERROR"
# Follow JSONL and extract fields
tail -f app.jsonl | jq --unbuffered -r '[.timestamp, .level, .message] | @tsv'
# Follow multiple files
tail -f /var/log/service-*.log | rg --line-buffered "error|warn"
# Merge multiple log files by timestamp
sort -t' ' -k1,2 service-a.log service-b.log > timeline.log
# JSONL: sort by timestamp field
jq -sc 'sort_by(.timestamp)[]' combined.jsonl > sorted.jsonl
# Extract timestamps and calculate gaps
jq -r '.timestamp' app.jsonl | awk '
NR > 1 {
cmd = "date -d \"" prev "\" +%s"; cmd | getline t1; close(cmd)
cmd = "date -d \"" $0 "\" +%s"; cmd | getline t2; close(cmd)
gap = t2 - t1
if (gap > 5) print gap "s gap before " $0
}
{ prev = $0 }
'
# Quick duration between first and last event
jq -sc '{start: .[0].timestamp, end: .[-1].timestamp}' app.jsonl
# Duration between paired events (start/end)
jq -sc '
group_by(.request_id) |
map(
(map(select(.event == "start")) | .[0].timestamp) as $start |
(map(select(.event == "end")) | .[0].timestamp) as $end |
{id: .[0].request_id, start: $start, end: $end}
)
' events.jsonl
# Identify the slowest phase
jq -sc '
sort_by(.timestamp) |
[range(1; length) | {
from: .[.-1].event,
to: .[.].event,
gap: ((.[.].ts_epoch) - (.[.-1].ts_epoch))
}] |
sort_by(-.gap) | .[0]
' events.jsonl
# Find a request across all service logs
fd -e jsonl . /var/log/services/ -x rg "\"req-abc-123\"" {}
# Build a timeline for a single request
fd -e jsonl . /var/log/services/ -x rg "\"req-abc-123\"" {} \; | jq -sc 'sort_by(.timestamp)[] | [.timestamp, .service, .event] | @tsv'
# Find events within 2 seconds of a known event
# First get the target timestamp
TARGET="2026-03-08T14:23:15"
jq -c --arg t "$TARGET" '
select(
.timestamp > ($t | sub("15$"; "13")) and
.timestamp < ($t | sub("15$"; "17"))
)
' other-service.jsonl
# Reconstruct a user session across log files
fd -e jsonl . /var/log/ -x rg "\"user-42\"" {} \; |
jq -sc 'sort_by(.timestamp)[] | [.timestamp, .service, .action] | @tsv'
# Last 10,000 lines (fast for append-only logs)
tail -n 10000 huge.log | rg "pattern"
# Last N lines of JSONL with structured extraction
tail -n 5000 huge.jsonl | jq -c 'select(.level == "error")'
# Split into 100K-line chunks
split -l 100000 huge.jsonl /tmp/chunk_
# Process in parallel
fd 'chunk_' /tmp/ -x jq -c 'select(.level == "error")' {} > errors.jsonl
# With GNU parallel
split -l 100000 huge.jsonl /tmp/chunk_
ls /tmp/chunk_* | parallel 'jq -c "select(.level == \"error\")" {} >> /tmp/errors.jsonl'
# SAX-style processing of a huge JSON array
jq --stream 'select(.[0][0] == "results" and .[0][-1] == "status") | .[1]' huge.json
# Extract items from a huge array without loading all
jq -cn --stream 'fromstream(1 | truncate_stream(inputs))' huge-array.json
# ALWAYS faster: rg filters text, jq parses survivors
rg '"error"' huge.jsonl | jq -r '.message'
# vs. SLOW: jq reads and parses every line
jq -r 'select(.level == "error") | .message' huge.jsonl
# Find all JSONL files with errors across trial directories
fd -e jsonl . trials/ -x rg -l '"error"' {}
# Count errors per log file across directories
fd -e jsonl . trials/ -x bash -c 'echo "$(rg -c "\"error\"" "$1" 2>/dev/null || echo 0) $1"' _ {}
# Extract and aggregate across directories
fd -e jsonl . trials/ -x jq -c 'select(.level == "error") | {file: input_filename, msg: .message}' {}
# Build summary table from multiple runs
for dir in trials/*/; do
total=$(wc -l < "$dir/results.jsonl")
errors=$(rg -c '"error"' "$dir/results.jsonl" 2>/dev/null || echo 0)
echo -e "$dir\t$total\t$errors"
done | column -t -N DIR,TOTAL,ERRORS
| Gotcha | Why It Hurts | Fix |
|---|---|---|
jq -s on huge files loads everything into memory | OOM crash or swap thrashing on files over ~500MB | Use streaming: rg prefilter, jq --stream, or split + parallel |
| JSONL with embedded newlines in string values | Line-by-line tools (rg, awk, head) split a single record across lines | Use jq -c to re-compact, or jq -R 'fromjson?' to skip malformed lines |
| rg matches JSON keys, not just values | rg "error" matches {"error_count": 0} which is not an error | Use rg '"level":"error"' or pipe to jq 'select(.level == "error")' |
| Timezone mismatches in timestamp comparisons | Events appear out of order or time ranges miss data | Normalize to UTC before comparing: `jq '.timestamp |
| Unicode and escape sequences in log messages | jq chokes on invalid UTF-8 or double-escaped strings | Prefilter with rg -a (binary mode), or use jq -R for raw strings |
| Inconsistent JSON schemas across log lines | jq errors on lines missing expected fields | Use // operator for defaults: .field // "missing" and ? for optional: .arr[]? |
Forgetting -c flag with jq on JSONL | jq pretty-prints each line, output is no longer valid JSONL | Always use jq -c when output feeds into another JSONL consumer |
| tail -f with jq buffering | Output appears delayed or not at all | Use jq --unbuffered or stdbuf -oL jq |
| Sorting JSONL by timestamp without slurp | sort command does lexicographic sort on whole lines, not by field | Either jq -sc 'sort_by(.timestamp)[]' or extract timestamp prefix first |
| Assuming log files are complete | Logs may be rotated, compressed, or still being written | Check for .gz rotated files: fd -e gz . /var/log/ -x zcat {} | rg pattern |
| Single quotes in jq on Windows | PowerShell/cmd do not handle single quotes the same as bash | Use double quotes with escaped inner quotes, or write jq filter to a file |
| File | Contents | Lines |
|---|---|---|
references/jsonl-patterns.md | JSONL extraction, aggregation, transformation, comparison, and performance patterns | ~700 |
references/analysis-workflows.md | Agent conversation analysis, application log analysis, benchmark result parsing, cross-directory workflows | ~600 |
references/tool-setup.md | Installation and configuration for jq, lnav, angle-grinder, rg, awk, GNU parallel, Miller | ~450 |