From ai-brain-starter
Extracts structured metadata from typed vault files (books, meetings, people, etc.), optionally runs knowledge-graph extraction, applies wikilinks, and surfaces cross-type insights. Use to map your second brain or refresh vault index.
npx claudepluginhub adelaidasofia/ai-brain-starter[--metadata-only | --insights-only | --dry-run | --sample [N] | --force | --type <name>]This skill uses the workspace's default tool permissions.
Your vault is a database. This skill makes it queryable.
Ingests content from Confluence, Google Docs, GitHub repos, remote URLs, or local files (DOCX, PDF, etc.) into Second Brain vault. Converts to Markdown via docling, runs graphify extraction, persists entities.
Explores Obsidian vaults via semantic entities (tags, wikilinks, folders), AI catalysts, and petri readouts to surface idea connections and latent patterns. Use for conceptual note searches over keywords.
Runs 7-phase audit on Obsidian vault: structural scan, duplicate detection, link integrity check, frontmatter audit, MOC review, cross-agent integration, and health report.
Share bugs, ideas, or general feedback.
Your vault is a database. This skill makes it queryable.
| Phase | Tool | LLM cost | Always runs? |
|---|---|---|---|
| 1 | vault-metadata-extract.py (dispatcher → type-specific extractors) | 0 tokens | Yes |
| 2 | /graphify (optional) | ~100k–1M tokens | No — asks first |
| 3 | Wikilink gaps + interactive apply | ~5k tokens | If graph exists |
| 4 | vault-insight-engine.py — cross-type surprise finder | 0 tokens | Yes |
Phases 1 and 4 are free. Phase 2 is expensive and opt-in. Phase 3 needs interactive approval so it skips gracefully in non-TTY contexts.
Run once after cloning ai-brain-starter:
/setup-vault-types
Interactive wizard asks which doc types you have (journal, book, article, meeting, person, project, podcast, client, etc.) and installs the matching extractors. You can add custom types later by editing scripts/extractors/schemas.yaml or running /setup-vault-types --add <name>.
/second-brain-mapping # full pipeline
/second-brain-mapping --metadata-only # skip graphify + wikilinks, keep insights
/second-brain-mapping --insights-only # only run Phase 4 on existing metadata
/second-brain-mapping --type book # only process files with `type: book`
/second-brain-mapping --dry-run # preview without writes
/second-brain-mapping --sample # preview 1 file per configured type (cold-start safe)
/second-brain-mapping --sample 3 # preview 3 files per configured type
First-time cold-start? Run --sample first. It processes one file per registered type, shows you the actual extracted fields, and exits without writing anything. If the output looks right, re-run without --sample for the full pipeline.
Most PKM tools stop at search. This turns your vault into a queryable database:
Dataview handles the queries. This skill handles the structured fields that make Dataview precise.
Follow in order. Do not skip.
Run date for timestamp. Parse any argument flags.
Precheck: was /setup-vault-types run? Before anything else, confirm the vault has at least one document-type extractor configured. Without this, Phase 1 runs silently on every file and reports "no extractor registered" for the user's entire vault — a classic cold-start bounce.
EXTRACTOR_DIR="$(pwd)/scripts/extractors"
EXTRACTOR_COUNT=0
if [[ -d "$EXTRACTOR_DIR" ]]; then
EXTRACTOR_COUNT=$(find "$EXTRACTOR_DIR" -maxdepth 1 -name '*.py' -not -name '_*' 2>/dev/null | wc -l | tr -d ' ')
fi
if [[ "$EXTRACTOR_COUNT" -eq 0 ]]; then
echo "No document-type extractors are configured yet."
echo "Run /setup-vault-types first — that wizard asks which kinds of notes"
echo "you take (journal, book, meeting, person, etc.) and installs the matching"
echo "extractors. Then re-run /second-brain-mapping."
exit 4
fi
If this check fails, stop. Do not proceed to any phase. Tell the user to run /setup-vault-types and offer to invoke it for them.
Read the state file to see what was done and when:
STATE_FILE="$(vault-root)/⚙️ Meta/.second-brain-mapping-state.json"
[[ -f "$STATE_FILE" ]] && cat "$STATE_FILE" || echo '{}'
State file format (JSON):
{
"phase_1_metadata": "2026-04-21T10:02:00-05:00",
"phase_2_graphify": "2026-04-21T09:04:26-05:00",
"phase_3_wikilinks": null,
"phase_4_insights": "2026-04-21T10:02:00-05:00"
}
null means never completed OR killed mid-run. A timestamp means last successful completion.
Decision rule:
--force flag: run everything regardless.After each phase succeeds, update its stamp with the current ISO-8601 timestamp. If a phase is killed or errors, leave the stamp untouched so next run sees it as incomplete.
Helper to write stamp (use after each phase):
python3 -c "
import json, pathlib, datetime
p = pathlib.Path('$STATE_FILE')
d = json.loads(p.read_text()) if p.exists() else {}
d['$PHASE_KEY'] = datetime.datetime.now().astimezone().isoformat(timespec='seconds')
p.write_text(json.dumps(d, indent=2))
"
If Step 1 decided to skip, skip. Else:
python3 "$(vault-root)/scripts/vault-metadata-extract.py" $FLAGS
On success, stamp phase_1_metadata. Report: X files written, Y already tagged, types with no registered extractor.
Always ask, even if stamp is fresh. Graphify has its own internal staging and token cost varies wildly.
Before asking, compute a vault-specific cost estimate. A generic "~100k-1M tokens" warning is useless to a first-time user. Show them numbers tied to their actual corpus:
python3 <<'PY'
import os, glob, pathlib, sys
vault = os.getcwd()
SKIP = {"⚙️ Meta", "Archive", ".git", ".obsidian", "graphify-out", "node_modules"}
total_files = 0
total_words = 0
for fp in glob.glob(os.path.join(vault, "**", "*.md"), recursive=True):
parts = set(fp.split(os.sep))
if parts & SKIP:
continue
total_files += 1
try:
with open(fp, "r", encoding="utf-8", errors="ignore") as f:
total_words += len(f.read().split())
except Exception:
pass
# Rough estimate: 1 word ≈ 1.3 tokens input; graphify wrappers reduce ~85% for a full run
# Output is typically 10-15% of input for extraction.
input_tok = int(total_words * 1.3 * 0.15) # after dedupe + cache + preextract
output_tok = int(input_tok * 0.12)
# Sonnet 4.6 public pricing (as of 2025): $3/M input, $15/M output
cost_usd = (input_tok / 1_000_000) * 3 + (output_tok / 1_000_000) * 15
# Incremental run (cache warm): roughly 10% of cold-start
cold_cost = cost_usd
warm_cost = cost_usd * 0.10
existing = pathlib.Path("graphify-out/graph.json").exists()
mode = "incremental (cache warm)" if existing else "cold start (no cache yet)"
est_cost = warm_cost if existing else cold_cost
print(f"Corpus: {total_files:,} files · ~{total_words:,} words")
print(f"Mode: {mode}")
print(f"Tokens: ~{input_tok:,} input · ~{output_tok:,} output (estimate)")
print(f"Cost: ~${est_cost:.2f} at Sonnet 4.6 public pricing")
print(f" (cold start would be ~${cold_cost:.2f}; incremental ~${warm_cost:.2f})")
PY
stat -f "%Sm" "$(pwd)/graphify-out/graph.json" 2>/dev/null || echo "Last graph: none yet"
Then ask: "Run graphify on this corpus? y/N"
If yes, invoke /graphify --update. Read ~/.claude/skills/graphify/SKILL.md first. On success, stamp phase_2_graphify.
Pricing caveat: the cost estimate uses public Sonnet rates and graphify's typical compression ratio. Actual cost depends on cache hit rate, chunk granularity, and whether --mode deep is used. Treat the number as an order-of-magnitude guide, not a quote.
If Step 1 decided to skip, skip. Else:
python3 "$(vault-root)/scripts/graphify_wikilink_gaps.py"
if [[ -t 0 ]]; then
python3 "$(vault-root)/scripts/graphify_apply_wikilinks.py"
else
echo "Non-interactive: wikilink apply skipped. Run manually to review."
fi
On success (both commands exit 0), stamp phase_3_wikilinks. If killed mid-run or errors, DO NOT stamp — next invocation will see it as null and re-run.
If Step 1 decided to skip, skip. Else:
python3 "$(vault-root)/scripts/vault-insight-engine.py" --top 5
On success, stamp phase_4_insights. Read the top 5 findings aloud. Don't summarize — paste the report section verbatim so the user sees the raw signal.
Scoping to a recent batch. When Phase 2 only processed a subset of files (e.g. a /graphify --update of 200 new files), the same vault-wide patterns dominate every run. To surface insights specific to the batch instead, pass --scope-files:
python3 "$(vault-root)/scripts/vault-insight-engine.py" \
--scope-files "$(vault-root)/path/to/file-list.txt" \
--scope-label "batch-YYYY-MM-DD" \
--top 5
File list = one path per line (relative to vault-root or absolute). Findings restrict to those files; baselines still derive from the full vault so "surprise" is measured against your whole history. Without the flag, behavior is unchanged.
Print a compact summary. Then suggest 2-3 concrete Dataview queries the user could now run based on what got extracted. Examples:
You now have 47 books and 264 people. Try these queries on any note:
// Books you loved that mention a concept:
TABLE book_author, book_rating_1_5
FROM "Notes/Books"
WHERE book_rating_1_5 >= 4
AND contains(book_themes, "<concept>")
// High-priority contacts going cold:
TABLE person_last_journal_iso, person_next_step
FROM "CRM"
WHERE person_priority = "high"
AND person_last_journal_iso < dateformat(date(today) - dur(60 days), "yyyy-MM-dd")
scripts/
vault-metadata-extract.py # entry point
vault-insight-engine.py # cross-type surprise finder
vault-classify-untyped.py # MiniMax-powered type suggester
second-brain-mapping.sh # orchestrator (all four phases)
extractors/
_base.py # shared helpers
_dispatcher.py # type → extractor routing
schemas.yaml # declares fields per type
journal.py book.py person.py concept.py article.py
business.py meeting.py ai_chat.py writing_draft.py
strategy.py negotiation_prep.py company.py
daily_log.py talk.py travel.py goal.py
playbook.py asset.py reference.py
# Add your own: extractors/<type>.py + entry in schemas.yaml
Each extractor module exports AUTO_FIELDS and extract(filepath, body, fm, context) -> ExtractionResult. The dispatcher auto-discovers any file in extractors/ that has an extract function.
extractors/schemas.yaml to declare your fieldsbook.py) as templateextractors/<your_type>.pytype: <your_type> to any doc that qualifies/second-brain-mapping --type <your_type> to verifyThe framework doesn't care what types exist. It cares that each type declares its fields and ships an extractor.