Crawls Notion and Google Drive to build a structured catalog of knowledge sources with classification and freshness tracking. Activates when the user wants to index the knowledge base, refresh the source catalog, scan for new documents, or asks 'update the knowledge base index.' Uses a 9-type taxonomy with metadata extraction.
From founder-osnpx claudepluginhub thecloudtips/founder-os --plugin founder-osThis skill uses the workspace's default tool permissions.
references/content-classification.mdDesigns and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Crawl the Notion workspace and Google Drive to build a structured catalog of all available knowledge sources. For each discovered source, extract metadata (title, URL, type, classification, keywords, word count, last edited, parent location), assign a content classification from a 9-type taxonomy, compute a freshness tier, and write the result to the "[FOS] Knowledge Base" Notion database with Type="Source" (falls back to "Founder OS HQ - Knowledge Base", then legacy "Knowledge Base Q&A - Sources" if the consolidated database is not found). The index serves as the lookup layer for /founder-os:kb:ask and /founder-os:kb:find commands, enabling fast retrieval by classification, keywords, and freshness without re-scanning the entire workspace on every query.
This skill handles source discovery and cataloging only. Content retrieval and answer synthesis are handled by the knowledge-retrieval and answer-synthesis skills respectively.
Execute the discovery pipeline in two phases: Notion first (required), then Google Drive (optional). Each phase produces a list of source records that are merged and written to the Sources DB.
Scan the Notion workspace for all accessible pages and databases using the Notion MCP server.
Call notion-search with an empty query and filter.property: "object", filter.value: "page" to retrieve all accessible pages. Paginate through all results using the start_cursor from each response until has_more is false. Cap at 500 pages per index run to prevent runaway scans on large workspaces. When the cap is reached, warn: "Reached 500-page limit. Re-run with specific parent filters to index additional pages."
Call notion-search with filter.property: "object", filter.value: "database" to retrieve all accessible databases. Databases are indexed as sources themselves (not their contents). Record the database title, URL, property schema summary, and item count when available.
For each discovered page, extract:
https://www.notion.so/[page_id_without_hyphens].notion-fetch for classification and keyword extraction. Limit content retrieval to the first 3000 characters to control API usage.last_edited_time property on the page object.parent property -- resolve to the parent page title or workspace root. Format as a breadcrumb path: "Workspace > Parent Page > Current Page".Process pages sequentially. After every 10 pages, emit a progress update: "Indexed [N] of [total] Notion pages..." to keep the user informed during long scans.
When gws CLI is available (check with which gws), scan for documents to supplement the Notion index.
Use the gws CLI via Bash to search for documents. Execute three targeted searches:
mimeType = 'application/vnd.google-apps.document' -- Google DocsmimeType = 'application/pdf' -- PDFsmimeType = 'application/vnd.google-apps.spreadsheet' -- Google SheetsCap at 200 files total across all searches. Skip files in Trash.
For each discovered file, extract:
webViewLink property.modifiedTime property.Content retrieval from Drive files is limited to the title and folder context for classification purposes. Full content retrieval happens at query time via the knowledge-retrieval skill.
When gws CLI is unavailable or not authenticated, skip Phase 2 entirely. Log: "Google Drive unavailable -- indexing Notion sources only." Do not treat Drive unavailability as an error. The index is fully functional with Notion sources alone.
Assign exactly one classification from the 9-type taxonomy to each source. Use a first-match-wins priority system: evaluate classification rules in the order listed and stop at the first match.
| Priority | Classification | Summary |
|---|---|---|
| 1 | wiki | Long-form knowledge articles, documentation hubs, team wikis |
| 2 | meeting-notes | Records of meetings, standups, retrospectives |
| 3 | project-docs | Project plans, briefs, specs, roadmaps, PRDs |
| 4 | process | SOPs, how-tos, step-by-step workflows, runbooks |
| 5 | reference | Lookup tables, glossaries, FAQs, config references |
| 6 | template | Reusable scaffolds, form templates, starter docs |
| 7 | database | Notion databases and structured data collections |
| 8 | archive | Deprecated, outdated, or explicitly archived content |
| 9 | other | Default when no classification matches |
Apply three detection layers in order. Each layer can produce a classification. Accept the first classification that reaches sufficient confidence:
Title pattern matching: Match the source title against known patterns for each type. This is the fastest and most reliable signal. See ${CLAUDE_PLUGIN_ROOT}/skills/kb/source-indexing/references/content-classification.md for the full pattern list.
Content structure analysis: When title matching produces no result, analyze the first 3000 characters of content for structural signals (headings, list formats, Q&A patterns, step numbering). See the reference file for signal definitions per type.
Parent location heuristics: When content analysis is inconclusive, use the parent page or folder name as a classification hint. A page under a "Wiki" parent inherits the wiki classification. See the reference file for parent-to-classification mappings.
When all three layers are inconclusive, assign other.
For Notion databases, always assign database regardless of title or parent -- the object type itself is the classification signal.
Extract these fields for every discovered source:
| Field | Extraction Method |
|---|---|
| Source Title | Page title property or file name. Max 200 characters, truncate with "..." |
| URL | Notion page URL or Drive webViewLink. This is the idempotent key |
| Source Type | "Notion Page", "Notion Database", or "Google Drive" |
| Classification | From the 9-type taxonomy (see above) |
| Topic Keywords | Top 5-8 keywords extracted from content (see keyword extraction below) |
| Word Count | Approximate word count from content body. For databases, use 0 |
| Last Edited | last_edited_time (Notion) or modifiedTime (Drive) as ISO 8601 |
| Freshness | Computed tier based on Last Edited date (see freshness tracking below) |
| Parent Location | Breadcrumb path: "Workspace > Parent > Page" or Drive folder path |
| Status | "Active" for accessible sources, "Archived" for archived pages, "Error" for fetch failures |
| Indexed At | Current timestamp at time of index write |
Extract 5-8 topic keywords from the source content to enable keyword-based search in the /founder-os:kb:ask and /founder-os:kb:find commands.
For pages with insufficient content (fewer than 50 words), extract keywords from the title only. For databases, extract keywords from the database title and property names.
Compute a freshness tier for each source based on the elapsed time between its Last Edited date and the current date.
| Tier | Days Since Edit | Meaning |
|---|---|---|
| Fresh | 0-29 days | Recently updated, highly reliable |
| Current | 30-89 days | Reasonably up to date |
| Aging | 90-179 days | May contain outdated information |
| Stale | 180+ days | Likely outdated, use with caution |
days_since_edit = (current_date - last_edited_date).days
if days_since_edit < 30: freshness = "Fresh"
elif days_since_edit < 90: freshness = "Current"
elif days_since_edit < 180: freshness = "Aging"
else: freshness = "Stale"
Recalculate freshness on every index run. A source that was "Fresh" last month may now be "Current" or "Aging". The freshness tier is always relative to the current date, never cached from a previous run.
The knowledge-retrieval skill uses freshness tiers to weight search results: Fresh sources rank higher than Stale sources for the same keyword match. The answer-synthesis skill includes freshness caveats when citing Aging or Stale sources (e.g., "Note: this source was last updated 5 months ago").
Write all indexed sources to the Notion database.
Discover the target database using the consolidated name first, then fall back to legacy:
"Source" on every record written.On subsequent runs, reuse the discovered database.
Match existing records by the URL property. For each source in the current scan:
This ensures re-running /founder-os:kb:index updates existing entries and adds new sources without creating duplicates.
Sources that existed in a previous index but are no longer discoverable (deleted pages, revoked Drive access) are not automatically removed from the Sources DB. Instead, when a previously indexed URL returns a 404 or access error during a re-index, set the record's Status to "Archived" and preserve the existing metadata. This maintains a historical record while signaling that the source is no longer accessible.
After completing the full pipeline, present a summary to the user:
Index Complete
--------------
Notion Pages: [N] indexed
Notion Databases: [N] indexed
Google Drive: [N] indexed (or "skipped -- Drive unavailable")
Total Sources: [N]
By Classification:
wiki: [N]
meeting-notes: [N]
project-docs: [N]
process: [N]
reference: [N]
template: [N]
database: [N]
archive: [N]
other: [N]
By Freshness:
Fresh: [N]
Current: [N]
Aging: [N]
Stale: [N]
New: [N] sources added
Updated: [N] sources refreshed
Errors: [N] sources failed (see details below)
Include error details for any sources that failed metadata extraction, listing the source title and error reason.
When a Notion page exists but has an empty body:
When a page has no title property:
When indexing Notion databases:
database.When a page or file returns a permission error during metadata extraction:
When the Notion search returns more than 500 pages:
For the complete 9-type classification taxonomy with detection signals, title patterns, content structure indicators, parent location mappings, keyword extraction methodology, and example index outputs, consult:
${CLAUDE_PLUGIN_ROOT}/skills/kb/source-indexing/references/content-classification.md