Extract portfolio entities and structured context from uploaded documents (uploads/ folder). Use whenever the user mentions uploading files, importing documents, ingesting data, "I have some files", "parse these docs", "use these docs as input", "internal documents", "background material", "here's our strategy deck", processing uploads, or wants to populate their portfolio from existing material — even if they don't say "ingest".
From cogni-portfolionpx claudepluginhub cogni-work/insight-wave --plugin cogni-portfolioThis skill is limited to using the following tools:
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Builds 3-5 year financial models for startups with cohort revenue projections, cost structures, cash flow, headcount plans, burn rate, runway, and scenario analysis.
Extract portfolio entities and institutional context from user-provided documents in the project's uploads/ folder. Supported file types: .md, .docx, .pptx, .xlsx, .pdf.
Most users already have product information scattered across decks, spreadsheets, and documents. Ingestion bridges the gap between existing material and a structured portfolio in two ways:
Entity extraction gives the user a head start on portfolio structure. Context extraction ensures the intelligence buried in strategy decks, pricing models, and win/loss reports doesn't get lost — it flows into propositions, solutions, competitor analysis, and every other downstream skill automatically.
portfolio.json must exist)uploads/ directorydocument-skills plugin for non-markdown file extraction (docx, pptx, xlsx, pdf)Find the active portfolio project by searching for portfolio.json under a cogni-portfolio/ path (same approach as the portfolio-resume skill). If multiple projects exist, ask which one to use.
Scan uploads/ for supported files, excluding the processed/ subdirectory. If no files are found, tell the user the folder is empty and list the supported file types.
If non-markdown files are present (.docx, .pptx, .xlsx, .pdf), verify document-skills availability. If unavailable, inform the user which files cannot be processed, process only the .md files, and leave the binary files in uploads/ for later.
Process each file based on its type:
document-skills:docx skilldocument-skills:pptx skilldocument-skills:xlsx skilldocument-skills:pdf skillFor large documents (PDFs over 20 pages, Excel with many sheets), process in segments. For PDFs, use the pages parameter to read 10-20 pages at a time. For Excel, process one sheet at a time. Present extracted entities per segment so the user can confirm incrementally rather than reviewing dozens of entities at once.
Read portfolio.json to understand the company context.
Before classifying content, check if any of the current upload files match previously ingested sources. If source-registry.json exists, run:
bash $CLAUDE_PLUGIN_ROOT/scripts/source-registry.sh "<project-dir>" check-docs
If the result shows changed documents (same filename, different hash), alert the user:
"This file was previously ingested and created N features and M context entries. The content has changed since last ingestion. Would you like to:
- Re-ingest and refresh — extract new entities/context, mark old linked entities as stale for downstream refresh
- Re-ingest fresh — extract without linking to previous entities (treat as new source)
- Skip — leave this file for later"
If the user chooses option 1, note the previously linked entities for staleness flagging in Step 8b.
Then analyze extracted content for both entities and context.
Identify potential entities:
| Entity Type | What to Look For |
|---|---|
| Products | Named offerings, product lines, service packages |
| Features | Capabilities, specifications, functions, technical components |
| Markets | Target segments, customer groups, geographic regions, verticals |
Competitive intelligence or buyer persona data found in documents is worth noting, but don't create competitor or customer entities during ingestion. These types require propositions or markets as parents and are handled by the compete and customers skills after prerequisite entities exist.
Cross-reference with existing entities in products/, features/, and markets/ directories to avoid duplicates.
In the same pass, identify intelligence snippets — self-contained insights that provide institutional knowledge for downstream skills. Classify each snippet into one of six categories:
| Category | What to Look For | Primary Downstream Skills |
|---|---|---|
competitive | Win/loss reports, competitor mentions, battlecards, RFP outcomes | compete, propositions |
market | Market research, TAM analyses, customer segmentation, industry reports | markets, propositions |
pricing | Pricing models, rate cards, discount structures, margin targets, cost benchmarks | solutions, packages |
customer | Interview transcripts, CRM summaries, buyer persona research, NPS data | customers, propositions |
technical | Architecture docs, technical specs, product roadmaps, integration guides | features |
strategic | Strategy decks, positioning documents, differentiation analyses, board presentations | propositions, solutions |
Each snippet should be a self-contained insight (one fact, one benchmark, one positioning statement) with enough surrounding context to be useful. Aim for 3-10 context entries per document, depending on richness. Do not extract trivially obvious information — focus on intelligence that would be hard to re-derive from scratch (specific numbers, internal decisions, competitive observations, customer quotes).
When a snippet relates to specific portfolio entities (a pricing benchmark for a particular product, a competitive insight about a specific market), note those entity slugs for linking.
Group by source file. Present entities first, then context entries.
From: product-overview.pdf
Entities:
| Type | Slug | Name | Key Fields |
|---|---|---|---|
| Product | cloud-platform | Cloud Platform | description, positioning |
| Feature | auto-scaling | Auto-Scaling | product: cloud-platform, category: infrastructure |
Context:
| # | Category | Summary | Linked Entities | Confidence |
|---|---|---|---|---|
| 1 | strategic | Company positions as "sovereign cloud" differentiator in DACH region | products/cloud-platform | high |
| 2 | competitive | Main competitor Datadog weak in mid-market due to per-host pricing | compete | medium |
| 3 | pricing | Target margin 35% with blended rate 1,400 EUR/day for DACH | solutions, packages | high |
Show enough detail for the user to judge accuracy. Mark entities that may overlap with existing ones.
Allow the user to:
Not every document will produce both entities and context. A strategy deck might yield mostly context with no new entities. A product spec might yield mostly entities with little context. Present only what was found.
For each confirmed entity, write a JSON file following the schemas in $CLAUDE_PLUGIN_ROOT/references/data-model.md:
products/{slug}.jsonfeatures/{slug}.jsonmarkets/{slug}.jsonSet created to today's date. Include "source_file": "<filename>" in each entity to enable tracing origins back to the uploaded document.
For features, ensure product_slug references a valid product. If a referenced product doesn't exist yet, propose creating it first or ask the user to assign a different product.
For each feature, draft a purpose field (5-12 words): a customer-readable statement answering "what is this feature FOR?" — the problem it solves or capability it provides. Derive purpose from the source document's context (e.g., section headings, executive summaries, or capability descriptions that frame the feature's value).
Assign sort_order to each feature following the value-to-utility spectrum: customer-facing value features get low numbers (10, 20, 30...), infrastructure/utility features get high numbers (70+). Use increments of 10 to leave room for insertions. This controls display ordering in the dashboard and reports.
For each confirmed context entry, write a JSON file to context/{source-slug}--{seq}.json following the context entry schema in $CLAUDE_PLUGIN_ROOT/references/data-model.md.
The slug is derived from the source filename (kebab-case, without extension) plus a zero-padded sequence number: e.g., pricing-strategy-2025--001, pricing-strategy-2025--002.
Set created to today's date. Set confidence based on how directly the insight comes from the document:
high — verbatim fact or number from the documentmedium — reasonable inference from document contentlow — interpretation that would benefit from user validationCreate context/ directory if it doesn't exist.
After writing all context entries, rebuild context/context-index.json by scanning all .json files in context/ (excluding context-index.json itself). The index has three lookup maps:
by_category — category string -> array of context slugsby_relevance — skill name -> array of context slugsby_entity — entity path (e.g., products/cloud-platform) -> array of context slugsInclude version, entry_count, and updated fields. See $CLAUDE_PLUGIN_ROOT/references/data-model.md for the full index schema.
After all confirmed items are written, move processed files to uploads/processed/. Create the directory if it doesn't exist. Only move files that were successfully processed. If a file yielded no usable entities or context (user skipped everything), still move it to avoid re-processing on the next run.
After moving files, update the source lineage registry for each processed file:
If source-registry.json does not exist, initialize it:
bash $CLAUDE_PLUGIN_ROOT/scripts/source-registry.sh "<project-dir>" init
For each processed file, register it with its fingerprint:
bash $CLAUDE_PLUGIN_ROOT/scripts/source-registry.sh "<project-dir>" register-doc "<project-dir>/uploads/processed/<filename>"
After registration, update the registry entry's entities and context_entries arrays to include all entities and context entries created from this file. Read source-registry.json, find the entry by source_id, and add:
"features/cloud-monitoring", "products/cloud-platform") to entities"pricing-strategy-2025--001") to context_entriesWrite source_refs on each created entity, pointing to the registry source_id. This supplements the existing source_file field for richer lineage tracking:
{
"source_file": "pricing-strategy-2025.pdf",
"source_refs": ["doc--pricing-strategy-2025"]
}
If this is a re-upload (detected in Step 4 as a changed document) and the user chose "Re-ingest and refresh":
status to "superseded"supersedes field to the old source_idlineage_status field:
{ "lineage_status": { "status": "stale", "flagged_at": "2026-04-03", "reasons": ["source doc--pricing-strategy-2025 re-uploaded with changes"] } }
portfolio-resume and portfolio-lineage will surface these stale entitiesIf any products were created during ingestion, run the centralized sync script:
$CLAUDE_PLUGIN_ROOT/scripts/sync-portfolio.sh <project-dir>
Skip this step if no products were created.
Show a summary of what was created:
| Type | Created | Skipped |
|---|---|---|
| Products | 2 | 0 |
| Features | 5 | 1 |
| Markets | 3 | 0 |
| Context | 8 | 2 |
Suggest the logical next step based on what was ingested:
markets skillpropositions skillcompete or customers after prerequisite entities are in placemarkets skill to add sizing estimatessolutions, compete, propositions, and customers skills when you run them."uploads/processed/ subdirectory is not scanned by project-status.shsolutions skill to design pricing tiers; it gives that skill better inputs to work from.context/, not just the current batch.portfolio.json in the project root. If a language field is present, communicate with the user in that language (status messages, instructions, recommendations, questions). Technical terms, skill names, and CLI commands remain in English. If no language field is present, default to English.$CLAUDE_PLUGIN_ROOT/references/data-model.md for complete entity and context schemas