From insight-blueprint
Overnight batch execution of queued analysis designs. Generates marimo notebooks (8-cell contract), executes headlessly, self-reviews results, records journal events, and produces a morning summary for human triage. Use when: running overnight analysis, processing batch queue, scheduling unattended analysis, reviewing morning results. Triggers: "batch", "バッチ実行して", "バッチ回して", "夜間実行", "バッチ分析", "一括実行", "overnight", "run overnight batch", "キューに入れて", "朝レビュー用に回して", "overnight analysis".
npx claudepluginhub etoyama/insight-blueprint --plugin insight-blueprintThis skill uses the workspace's default tool permissions.
Processes queued analysis designs overnight in Claude Code headless mode.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Processes queued analysis designs overnight in Claude Code headless mode. For each design, generates a marimo notebook (8-cell contract), executes it, self-reviews the results, records journal events (observe / evidence / question), and produces a morning review summary.
Based on the Papermill (Netflix) "fixed structure, AI-generated content" approach adapted for hypothesis-driven EDA.
| File | Purpose | Consumed by |
|---|---|---|
SKILL.md (this file) | Skill definition, contracts, configuration | Claude Code (on skill activation) |
references/batch-prompt.md | Full orchestration prompt (902 lines) | claude -p "$(cat ${CLAUDE_SKILL_DIR}/references/batch-prompt.md)" (headless execution) |
Set next_action on the design to queue it for batch execution:
update_analysis_design(design_id, next_action={"type": "batch_execute"})
With priority (lower number = processed first):
update_analysis_design(design_id, next_action={"type": "batch_execute", "priority": 1})
| type | Purpose | Status |
|---|---|---|
batch_execute | Queue for overnight batch | Active |
human_review | Awaiting human review (FV-1 reserved) | Future |
After processing, next_action is reset to {} (empty dict; MCP tool cannot set to null).
Designs with terminal status (supported / rejected / inconclusive) are
skipped even if queued.
All generated notebooks follow a fixed 8-cell structure. Cell structure is fixed; cell content is AI-generated per design.
| Cell | Name | Input | Output | Responsibility |
|---|---|---|---|---|
| 0 | imports | -- | (pd, plt, np, LineageSession, export_lineage_as_mermaid, tracked_pipe) | Library imports + rcParams setup |
| 1 | meta | (mo,) | -- | Display design info (design_id, title, hypothesis, intent) |
| 2 | data_load | (pd, LineageSession) | (raw_df, session, mo) | CSV/DB load + LineageSession init + import marimo as mo |
| 3 | data_prep | (raw_df, session, tracked_pipe, mo) | (df_clean,) | Methodology-independent preprocessing. All ops via tracked_pipe |
| 4 | analysis | (df_clean, pd, session, tracked_pipe, mo) | (results,) | Methodology-dependent analysis + lineage tracking. Behavior varies by intent |
| 5 | viz | (df_clean, results, plt) | -- | Visualization. _ prefix mandatory. plt.gcf() last |
| 6 | verdict | (results, mo) | (verdict,) | Conclusion + evidence + open questions |
| 7 | lineage | (session, export_lineage_as_mermaid, mo) | -- | Mermaid lineage diagram display |
tracked_pipe
for lineage recording.session and
tracked_pipe to record methodology-dependent transformations in lineage,
extending coverage beyond preprocessing into the analysis pipeline.Both intents MUST include structured direction fields in results:
hypothesis_direction, observed_direction, confidence_level, decision_reason
exploratory:
results: discovered patterns + direction fieldsconfirmatory:
results: each metric's value + threshold + pass/fail + direction fieldsThe verdict variable must conform to this schema:
verdict = {
"conclusion": str, # One-line conclusion
"evidence_summary": list[str], # Evidence bullet points
"open_questions": list[str], # Unresolved questions
}
This schema is the interface between notebook execution and journal recording. Changes require updating journal extraction logic simultaneously.
These rules are mandatory for all generated notebooks. See also
.claude/rules/marimo-notebooks.md.
_ prefix for cell-local variables: _fig, _ax, _subset, etc.
Variables without _ are exported to notebook scope and will conflict
across cells (multiple-defs error).plt.gcf() as last expression in viz cell: marimo does NOT
auto-capture matplotlib figures.mo.mermaid() for Mermaid diagrams: mo.md() with ```mermaid
code blocks renders as raw text.mo.md(): Build string beforehand,
then pass to mo.md().import marimo as mo in Cell 2 only: Other cells receive mo as
an argument. Placing mo in Cell 0 causes circular dependency.return (df_clean,) -- not return df_clean.mo.md(): Dict returns alone do not produce
text output in session JSON. Always also render key values via mo.md().Where generated notebooks are saved.
Resolution order (highest priority first):
.insight/config.yaml key batch.notebook_dir.insight/runs/YYYYMMDD_HHmmss/{design_id}/YYYYMMDD_HHmmss is expanded to JST execution timestamp.
{design_id} is expanded per design.
Optional directory for shared utility functions across notebooks.
Resolution order (highest priority first):
.insight/config.yaml key batch.lib_dirWhen configured:
.py files -> generate/update lib_dir/CATALOG.mdsys.path.insert(0, lib_dir) + imports from catalog## data_utils.py
- `clean_revenue(df: pd.DataFrame) -> pd.DataFrame`: Standard revenue preprocessing
- `one_hot_time_slot(df: pd.DataFrame) -> pd.DataFrame`: One-hot encode time_slot
## viz_utils.py
- `plot_correlation_matrix(df: pd.DataFrame, columns: list[str]) -> None`: Correlation heatmap
.insight/runs/ # Batch execution root
YYYYMMDD_HHmmss/ # Per-execution directory (JST)
run.yaml # Run-level manifest (status, session_id, token)
events.jsonl # Claude Code stream-json NDJSON output
summary.md # Morning review summary
{design_id}/ # Per-design directory
manifest.yaml # Per-design execution manifest (atomic write)
notebook.py # Generated marimo notebook
__marimo__/session/ # marimo session JSON (auto-generated)
notebook.py.json # Session output
.insight/designs/
{design_id}_journal.yaml # Journal (appended to existing)
Directory naming uses YYYYMMDD_HHmmss format (e.g., 20260403_230000).
Multiple runs on the same day do not collide. Timestamps are JST.
Use launcher.sh for full pre-processing (token validation, crash recovery,
mode dispatch). The core claude invocation is:
RUN_DIR=".insight/runs/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$RUN_DIR"
claude -p "$(cat ${CLAUDE_SKILL_DIR}/references/batch-prompt.md)" \
--model sonnet \
--output-format stream-json \
--include-hook-events \
--fallback-model sonnet \
--max-turns ${BATCH_MAX_TURNS:-200} \
--allowedTools "mcp__insight-blueprint__list_analysis_designs,mcp__insight-blueprint__get_analysis_design,mcp__insight-blueprint__get_table_schema,mcp__insight-blueprint__update_analysis_design,mcp__insight-blueprint__transition_design_status,mcp__insight-blueprint__search_catalog,mcp__context7__resolve-library-id,mcp__context7__query-docs,Read,Write,Bash,Glob,Grep" \
--permission-mode bypassPermissions \
--max-budget-usd ${BATCH_MAX_BUDGET_USD:-10} \
> "$RUN_DIR/events.jsonl" 2>&1
Flag rationale:
--model sonnet: Quality-first for 30min/design self-review (DD-5)--output-format stream-json: NDJSON event stream for crash-safe persistence--include-hook-events: Capture hook lifecycle events in events.jsonl--fallback-model sonnet: Resilient fallback on transient API errors--max-turns: Configurable via BATCH_MAX_TURNS env var (default 200)--permission-mode bypassPermissions: Verified in V1e (no dangerouslySkipPermissions needed)--allowedTools: Minimum required tools (MCP + context7 + file tools)--max-budget-usd: Configurable via BATCH_MAX_BUDGET_USD env var (default 10). Cost safety valve (prevents runaway API billing). Does NOT limit tool calls or restrict dangerous operations — that is handled by --allowedTools and the trusted-analyst assumptionPrimary source: .insight/rules/package_allowlist.yaml.
Only these packages may be installed by the batch agent via uv add --dev:
| Alias | Import | pip/uv package |
|---|---|---|
| pandas | pandas | pandas |
| matplotlib | matplotlib | matplotlib |
| numpy | numpy | numpy |
| scipy | scipy | scipy |
| sklearn | sklearn | scikit-learn |
| statsmodels | statsmodels | statsmodels |
| seaborn | seaborn | seaborn |
| plotly | plotly | plotly |
To add a new package, update this allowlist and batch-prompt.md simultaneously.
bypassPermissions is acceptable under the trusted-analyst assumption..claude/rules/). Lessons go to {RUN_DIR}/lessons.md.Pre-launch: Use launcher.sh which handles directory creation, token
validation, crash recovery, and mode dispatch automatically:
bash ${CLAUDE_SKILL_DIR}/launcher.sh [--approved-by TOKEN]
If the overnight run crashes or the machine reboots mid-batch, re-run the same command. The launcher detects incomplete runs automatically and resumes the existing claude session — you do NOT need to re-issue a token or invoke any recovery command.
# Crash recovery is transparent: same command, same token.
bash ${CLAUDE_SKILL_DIR}/launcher.sh --approved-by <token_id>
What happens under the hood:
launcher.sh scans .insight/runs/*/run.yaml for non-completed runs.premortem_token is still
valid (TTL unexpired) and a session_id was recorded, the launcher
verifies each unfinished design's on-disk hash against the token
(pre-resume hash check) and re-invokes claude --resume <session_id>.status=skipped, skip_reason=hash_mismatch and not re-processed.status=incomplete, skip_reason=token_expired_or_crashed; you then
run /premortem again to issue a fresh token and start a new batch.The final state of run.yaml after the launcher exits is always one of
completed / timeout / incomplete — the status field is never left as
running. Inspect it with:
yq .status .insight/runs/*/run.yaml | tail -5
The core of the 30min/design time budget. After generation and execution, the agent critically reviews its own analysis results.
| Phase | Check | Action |
|---|---|---|
| Data processing (Cell 3) | Missing value handling appropriate? Filter bias? Required columns remain? | Fix notebook -> re-execute |
| Analysis method (Cell 4) | Methodology fits hypothesis? Assumptions met? | Fix notebook -> re-execute |
| Result interpretation (Cell 6) | Evidence-conclusion consistency? Effect size meaningful? | Fix verdict -> re-execute |
| Open questions | Overlooked confounders or alternative explanations? | Add question events |
Decision criteria:
Use [SELF-REVIEW] marker in output for traceability.
| Elapsed | Behavior |
|---|---|
| 0-20 min | Normal processing (generate + execute + full review) |
| 20-25 min | Continue review, but limit remaining error fix attempts to 1 |
| 25-30 min | Simplify review to "critical deficiency check only" |
| 30+ min | Complete current phase -> journal recording -> move to next design |
Batch estimate: 5 designs x 30 min = 2.5 hours max. Typical: 10-15 min/design (generate 5min + execute 0.5min + review 5-10min).
Attempt 1: Direct fix from error message
- Target: ImportError, SyntaxError, NameError
- Verify: marimo export session exit code == 0 and all cells have output
Attempt 2: context7 marimo docs reference
- Target: marimo-specific errors (multiple-defs, cell dependency)
- Query: mcp__context7__resolve-library-id("/marimo-team/marimo") -> query-docs
- Verify: diff is limited to the problem area
Attempt 3: Alternative approach (simplify method, change parameters)
- Target: RuntimeError, ValueError (analysis logic)
- Verify: fix does not deviate from hypothesis/methodology intent
-> 3 failures: skip + record in summary
| Error | Detection | Action |
|---|---|---|
| Package missing | ModuleNotFoundError | uv add --dev from allowlist only -> retry |
| marimo syntax | multiple-defs, syntax error | context7 + fix -> lessons.md on success |
| Data source missing | FileNotFoundError | question event + skip |
| Analysis logic | ValueError, LinAlgError | Fix (3 attempts) |
| MCP connection failure | Tool call timeout | Stop entire batch |
When a marimo-specific error is fixed during batch execution, record to
{RUN_DIR}/lessons.md (NOT .claude/rules/marimo-notebooks.md).
Overnight batch runs must not modify shared policy files. The human
reviewer promotes relevant lessons during morning review.
## {Brief problem description}
{Conditions that trigger the problem}
\```python
# Bad: {code that causes error}
...
# Good: {fixed code}
...
\```
| Type | When | metadata |
|---|---|---|
observe | Data characteristics from Cell 2, Cell 3 | -- |
evidence | Analysis results from Cell 4, Cell 6 | direction: supports | contradicts |
question | Open questions from Cell 6 | -- |
NEVER generate conclude events. Conclusions are always human-driven.
Direction is determined from structured results fields, not free-text comparison.
results.confidence_level. If "ambiguous" -> no direction, record question event.results.hypothesis_direction and results.observed_direction:
supports, "rejected" -> contradicts, "inconclusive" -> questionsupports, oppose -> contradicts, unclear -> questionresults.decision_reason provides the audit trail.Required results fields for direction: hypothesis_direction, observed_direction,
confidence_level, decision_reason.
If .insight/designs/{design_id}_journal.yaml exists, preserve all existing
events. New event IDs start from max existing ID + 1.
ID format: {design_id}-E{nn:02d}
| From | To | When |
|---|---|---|
| /analysis-design | -> /batch-analysis | After design creation: "夜間バッチに投入するなら update_analysis_design(id, next_action={...})" |
| /batch-analysis | -> /analysis-reflection | Morning review: summary.md suggests "/analysis-reflection {id}" for each design |
| /batch-analysis | -> /analysis-journal | Additional investigation needed: "/analysis-journal {id}" |
| Tool | Used for |
|---|---|
list_analysis_designs() | Queue retrieval: filter next_action.type == "batch_execute" |
get_analysis_design(design_id) | Read design fields (hypothesis, metrics, methodology, etc.) |
get_table_schema(source_id) | Get data source connection info and schema |
update_analysis_design(design_id, ...) | Reset next_action to {} after processing |
transition_design_status(design_id, new_status) | Transition to analyzing (from in_review only) |
search_catalog(query) | Fallback when source_ids is empty |