Skill

extract-rules-from-vault

Generates draft hookify rules, skills, and CLAUDE.md from Slack exports, Notion, GDocs, or markdown vaults for ai-brain-starter onboarding.

Slack

Notion

npx claudepluginhub adelaidasofia/ai-brain-starter

Configuration

Arguments: <dump-path> [--out <output-dir>] [--max-files N] [--dry-run]

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Turn an existing company's tribal knowledge into a starter rule layer in one pass. Output is a draft, not a finished install. The founder reviews and accepts before anything ships into a live `CLAUDE.md`.

SKILL.md

Similar Skills

learn

Ingests content from Confluence, Google Docs, GitHub repos, remote URLs, or local files (DOCX, PDF, etc.) into Second Brain vault. Converts to Markdown via docling, runs graphify extraction, persists entities.

11 tools

bedrock

rules-distill

175.2k

Scans installed skills to extract cross-cutting principles appearing in 2+ skills and distills them into rules by appending, revising, or creating rule files. Use for periodic rules maintenance.

2 files

everything-claude-code

ai-md

36.7k

Converts verbose CLAUDE.md or LLM system prompts into compact structured-label format for better AI compliance and lower token usage. Use for long prompts AI ignores.

antigravity-awesome-skills

Stats

Stars10

Forks0

Last CommitMay 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

/extract-rules-from-vault

Why this exists

Every install of this system today starts from a blank CLAUDE.md and an empty hookify/ directory. The founder writes rules over weeks as Claude makes mistakes the founder corrects. That works, but it leaves the first three weeks unguarded. Companies that already have a Slack history, a Notion workspace, or a Drive of meeting notes have most of the rules already written down somewhere. They just are not in a format the agent can read.

This skill reads what is there, finds the recurring patterns, and writes drafts the founder can accept, edit, or reject. Three weeks of guard-rail accumulation, compressed into one synthesis pass.

Inputs supported

Input shape	Detected by	What we extract
Slack export `.zip`	`users.json` + `channels.json` at root	channel taxonomy, user roles, recurring decision phrases, owner-of-process patterns
Notion export `.zip`	nested `.md` with parent-folder `_index.md` shape	page hierarchy, recurring headings, RACI/template patterns
GDocs export `.zip` (Takeout)	flat `.docx`/`.html` siblings	doc title patterns, recurring labels, folder semantics
Markdown folder (Obsidian, Logseq, plain)	`.md` files with optional frontmatter	frontmatter fields, wikilink density, folder semantics, heading patterns
Mixed folder	fall-through	best-effort markdown pass

If the dump is not one of the above, fall through to the markdown pass on whatever .md files exist.

Outputs

Written to <output-dir>/ (default: ./extract-rules-output/):

extract-rules-output/
  CLAUDE.md.draft               proposed memory file
  hookify-rules/                one .md per candidate rule
  skills/                       one folder per candidate skill
  signals.json                  raw structured signals (audit trail)
  extraction-report.md          plain-English summary of what was found
  REVIEW.md                     prompts the founder must answer before accepting

Nothing in here is auto-applied. The skill prints the path and stops.

Steps

Step 0 — Validate the dump

Before any extraction:

ls "<dump-path>" || (echo "dump path does not exist"; exit 1)

If the path is a .zip, do NOT auto-unzip into a system temp dir without telling the user where. Ask once: "This is a zipped export. Unzip into <output-dir>/_unzipped/ and proceed?"

Refuse if the dump appears to contain credentials. Run a quick grep for BEGIN PRIVATE KEY, AWS_SECRET_ACCESS_KEY, password, api_key in the first 100 files. If any hit, stop and surface the path. The founder confirms it is fine to proceed or scrubs the dump first.

Step 1 — Run the structured-signal extractor

This is the load-bearing step. Per Build Standards Rule 4a (structured-signal-first), do not call the model on every file. Run the deterministic parser first.

python3 "${SKILL_DIR}/../../scripts/extract_rules/extract_rules_from_dump.py" \
  --dump "<dump-path>" \
  --out "<output-dir>" \
  ${MAX_FILES:+--max-files $MAX_FILES}

The script writes signals.json containing:

input_type: slack, notion, gdocs, markdown, mixed
entities: people, channels/folders, recurring titles
frequencies: how often each entity appears
decision_phrases: extracted "we don't do X", "rule is X", "always X", "never X" hits
process_signals: recurring channel-name patterns (#refunds, #incidents, #pricing-questions)
template_candidates: doc heading patterns that recur >= 3 times
unresolved: residual ambiguous items the model should look at

Read signals.json after the script finishes. Do NOT proceed to Step 2 until structured-signal coverage is verified.

Step 2 — Synthesize draft rules from signals

For each high-frequency signal, draft a rule. Use these templates:

Hookify rule candidate — when a decision_phrase repeats:

Source phrase: "never commit secrets to the public repo" (8 occurrences in #engineering, 3 in #security) Drafted rule: block Write/Edit on any file in **/public/** containing BEGIN PRIVATE KEY|AWS_SECRET|api_key= File: hookify-rules/no-secrets-in-public-paths.md

CLAUDE.md authority entry — when an owner_of_process signal repeats:

Source: 14 of 17 refund decisions in #refunds were resolved by user @laura over 6 months Drafted entry: "Refund authority: Laura. Any refund over $500 routes to her before issuing. Refunds under $500 follow the standard credit policy."

Skill candidate — when a process_signal shows a recurring multi-step ritual:

Source: every Friday a thread in #engineering titled "Week wrap" contains the same 4 sections (shipped, in-flight, blocked, next-week) Drafted skill: /week-wrap that prompts for those four sections and files to 📓 Journals/team/

For each draft, include in the front of the file:

confidence: high|medium|low — based on signal frequency and clarity
evidence: — at least one quoted phrase or path the rule was inferred from
review_questions: — the 1-3 questions the founder must answer to accept

Low-confidence drafts go in hookify-rules/_low-confidence/ so the high-signal stack stays clean.

Step 3 — Draft the CLAUDE.md memory

Compose CLAUDE.md.draft from these sections, in this order:

# Memory header
## Who works here — top 5-10 people by frequency, each with one inferred role line ("Engineering lead based on commit patterns and #engineering activity"). Do NOT invent titles. If unsure, write [role: REVIEW].
## Process authority — owner-of-process signals, one line each (refunds → Laura, on-call → Marco, pricing exceptions → Pricing Council Slack channel).
## What this company calls things — recurring proprietary nouns (product names, internal codenames, channel/team names, customer-tier vocabulary). Mine from frequency-ranked unique tokens.
## Rules already in writing — direct decision phrases from the dump, dated and quoted. These become candidates for hookify next.
## Open loops — recurring deadline phrases ("by EOQ", "before launch", "post-incident review pending") with their context.
## What this draft does NOT know — explicit gap list. The founder fills these in. Examples: "compensation philosophy", "promotion criteria", "investor relationships".

Critical: every claim must trace to a signal in signals.json. No model-invented facts. If the model wants to assert something that is not in signals.json, it goes under "What this draft does NOT know" instead.

Step 4 — Write the extraction report

extraction-report.md is the plain-English version a non-technical founder can read in 5 minutes:

"We scanned X files in Y format."
"We found Z people, W channels, V recurring rituals."
"We drafted N hookify rules, M CLAUDE.md entries, K skill candidates."
"Our top 3 high-confidence drafts are: ..."
"Our top 3 review questions for you are: ..."
"Files we flagged as ambiguous and skipped: ..."

This is what the founder reads first. The drafts are the deliverable; the report is the map.

Step 5 — Write REVIEW.md

The founder cannot accept all drafts blind. REVIEW.md is the gate:

# Review before accepting

For each section below, mark Accept / Edit / Reject and (if Edit) the change.

## Authority claims (5)

- [ ] Refunds → Laura. Source: 14/17 refund threads resolved by @laura.
  - Accept / Edit: ____ / Reject

## Hookify rules (12)

- [ ] no-secrets-in-public-paths (high). Source: 11 mentions of "never commit secrets".
  - Accept / Edit / Reject

## Skill candidates (3)

- [ ] /week-wrap (high). Source: 22 weekly wrap threads in #engineering.
  - Accept / Edit / Reject

## Open questions (7)

1. Who decides pricing exceptions over $10K?  (signals are ambiguous between Sales lead and CEO)
   Answer: ____

Generate one checkbox per drafted artifact, plus the unresolved questions surfaced in signals.json.

Step 6 — Print the next steps

End the skill output with:

Drafts written to <output-dir>.

Next:
  1. Read extraction-report.md (5 min)
  2. Walk through REVIEW.md with the founder (15-30 min)
  3. For each accepted draft, copy it into the live install:
     - hookify-rules/*  →  ~/.claude/hookify/  (or vault `.claude/hookify/`)
     - skills/*         →  ~/.claude/skills/  (or vault `.claude/skills/`)
     - CLAUDE.md.draft  →  merge into your CLAUDE.md
  4. Re-run /diagnose to confirm everything loads.

Do not auto-merge. Do not auto-install. Always print the path and stop.

Edge cases

Empty dump. If signals.json.entities has fewer than 5 people OR fewer than 10 distinct files, abort with: "Dump too small to extract patterns. Need at least 10 documents and 5 distinct authors. Reconnect a longer history."
Non-English dump. The script handles UTF-8. The synthesis should preserve the original language for proprietary nouns; only translate template scaffolding (e.g., "Memory", "Rules", "Open loops"). Never translate quoted decision phrases.
Mixed languages. If the dump has >20% non-Latin characters, prepend a one-line note in CLAUDE.md.draft so a future Claude session knows the team operates bilingually.
PII heavy. If signals.json.entities includes more than 50 people, the model is at risk of misattributing rules. Cap the authority section at the top 10 by frequency and put the rest under "Other contributors (review needed)".

What this skill is NOT

Not a search index. Glean and Sana already do that.
Not a knowledge graph. Use /graphify after this if you want one.
Not a one-click migration. The founder reviews every draft.
Not multi-tenant. This runs locally on a dump on your machine. Cloud export, encrypted upload, multi-team sync are downstream products.

Provenance

Every drafted rule has a quoted source phrase or file path. Drafts without provenance are a bug. If the model wants to write a rule it cannot ground in signals.json, it must instead surface that gap in REVIEW.md open questions.

This is the reliability primitive that distinguishes this skill from generic "ask an LLM to summarize the company" approaches. Drift is prevented at extraction time, not detected after the fact.