Help us improve
Share bugs, ideas, or general feedback.
From journalism-tools
Analyze preprocessed data for investigative journalism with full transparency. Use when a journalist has clean, preprocessed data ready for analysis and needs to identify patterns, anomalies, relationships, or statistical findings that support a story. Triggers include requests to analyze data, find patterns, identify outliers, cross-reference records, calculate statistics, or answer specific investigative questions. Complements the structured-data-preprocessing skill. Emphasizes simple, legible analyses over complex methods—every finding must be explainable to editors and defensible under scrutiny.
npx claudepluginhub nhagar/claude-plugins-journalism --plugin journalism-toolsHow this skill is triggered — by the user, by Claude, or both
Slash command
/journalism-tools:structured-data-analysis-journalismThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze preprocessed data to surface findings that support investigative reporting. Every analysis must be simple enough to explain, transparent enough to verify, and documented enough to defend.
Generate investigative journalism tipsheets from unfamiliar data collections. Use this skill whenever a user provides a dataset, document collection, database, or other raw material and wants to find leads, signals, patterns, outliers, or story tips — especially when the data is large, messy, or unfamiliar. Also trigger when the user says things like "what's in here", "anything interesting in this data", "find me leads", "tipsheet", "story ideas from this", "what jumps out", or when they drop a large dataset and want an initial assessment. This skill handles everything from a single CSV to multi-gigabyte collections with millions of records.
Generates a plain-language methodology explainer for data journalism projects, covering data sources, analysis steps, findings, and limitations for publication.
Validates CSV/TSV/Excel files and data analyses for quality, completeness, uniqueness, accuracy, consistency, outliers, and bias using qsv stats and frequency tools.
Share bugs, ideas, or general feedback.
Analyze preprocessed data to surface findings that support investigative reporting. Every analysis must be simple enough to explain, transparent enough to verify, and documented enough to defend.
Before running any analysis, produce a brief report for journalist review. Save as analysis_proposal.md.
Proposal Format:
# Analysis Proposal
**Investigation**: [Brief description]
**Data sources**: [List preprocessed files being analyzed]
**Date**: [Date]
---
## Proposed Analyses
### Analysis 1: [Descriptive title]
**Question**: What investigative question does this answer?
**Inputs**:
- File: `filename.csv`
- Columns: `col_a`, `col_b`, `col_c`
**Method**: [Plain-language description of what will be computed. Be specific but accessible.]
**Output**: [What will be produced—table, list, summary statistic, etc.]
**Supports claim**: [What finding would allow the journalist to report—frame as "Evidence that..." or "Allows us to say..."]
**Assumptions**:
- [Assumption 1 and why it's reasonable]
- [Assumption 2 and why it's reasonable]
**Limitations**: [What this analysis cannot tell us]
**Open questions**: [Any decisions needed from journalist]
---
### Analysis 2: [Title]
[Same structure]
---
## Summary
| # | Analysis | Key output | Supports |
|---|----------|------------|----------|
| 1 | [Title] | [Output type] | [One-line claim] |
| 2 | [Title] | [Output type] | [One-line claim] |
---
**AWAITING YOUR REVIEW**
Please confirm which analyses to proceed with, answer any open questions, and flag concerns.
Proposal Guidelines:
STOP after generating the proposal. Do not proceed until journalist explicitly approves.
After approval, execute each approved analysis with full documentation.
For each analysis:
Write documented code
Preserve verifiability
source_file, source_row) in all outputsValidate results
Document findings
After completing approved analyses, produce analysis_findings.md:
# Analysis Findings
**Investigation**: [Title]
**Date**: [Date]
**Analyses completed**: [N of M proposed]
---
## Finding 1: [Headline-style summary]
**From Analysis**: [Which analysis produced this]
**Key result**: [The core finding in plain language]
**Supporting numbers**:
- [Statistic]: [Value] (N=[record count])
- [Statistic]: [Value] (N=[record count])
**Underlying records**: See `finding_1_records.csv` ([N] records)
**Verification examples**: [3-5 specific records the journalist should spot-check, with source file and row]
**Caveats**:
- [Important limitation or context]
**Story language**: [Draft sentence suitable for publication, appropriately hedged]
---
## Output Files
| File | Description | Records |
|------|-------------|---------|
| `finding_1_records.csv` | Records supporting Finding 1 | N |
| `summary_statistics.csv` | All computed statistics | N |
---
## Methodology Notes
[Brief, plain-language explanation of what was done, suitable for a methodology box or editor questions]
Counting and aggregation: Frequencies, totals, averages by category. Simple, defensible, easy to verify.
Filtering and flagging: Identify records meeting specific criteria (thresholds, date ranges, category matches).
Cross-referencing: Match records across datasets on shared identifiers. Document match rates and non-matches.
Outlier identification: Flag statistical outliers using simple methods (percentiles, standard deviations). Always report the threshold used.
Time-based patterns: Trends, seasonality, before/after comparisons. Clearly define time boundaries.
Network/relationship mapping: Who connects to whom through shared attributes. Keep visualizations simple.
Statistical inference: Significance tests, confidence intervals. Only if journalist understands and can explain p-values. Always report effect sizes alongside p-values.
Predictive models: Rarely appropriate. If used, focus on feature importance over predictions. Never claim a model "proves" anything.
Text analysis: Keyword extraction, categorization. Be transparent about false positive/negative rates.
Black-box ML: No neural networks or methods that can't be fully explained.
Causal claims: Analysis shows correlation and patterns, not causation. Never use causal language.
Output files:
Stop and consult the journalist if you encounter: