Finds VCs for startups from product URL/description: detects industry/stage, identifies 5 funded comparables, researches backers (Track A) and thesis publishers (Track B), ranks with sourced deep-dives and outreach hooks.
npx claudepluginhub varnan-tech/opendirectory --plugin opendirectory-gtm-skillsThis skill uses the workspace's default tool permissions.
Take a product URL or description. Detect industry and stage. Find 5 comparable funded companies. Run two research tracks: who invested in those comparables (Track A), and which VCs publish theses about this space (Track B). Return a sourced, ranked investor list with outreach hooks.
Analyzes startup websites and pitch decks to assess fundraising stage, identify ideal VC profiles, score investors, detect portfolio conflicts, and generate personalized outreach.
Matches product descriptions and URLs to relevant VC investors by stage, industry, and niche using a curated static dataset and Python script. Useful for finding funding for startups or projects.
Runs market research, competitive analysis, investor due diligence, and industry scans. Use for market sizing, competitor comparisons, fund research, or tech scans.
Share bugs, ideas, or general feedback.
Take a product URL or description. Detect industry and stage. Find 5 comparable funded companies. Run two research tracks: who invested in those comparables (Track A), and which VCs publish theses about this space (Track B). Return a sourced, ranked investor list with outreach hooks.
Zero-hallucination policy: Every fact in the output must be traceable to a specific Tavily search result or the fetched product page. This applies to:
| The agent will want to... | Why that's wrong |
|---|---|
| Add a16z or Sequoia because they are famous | A famous VC without evidence is noise. Only include VCs that appear in Tavily search results for this specific product. Name-dropping wastes the founder's time. |
| Generate comparable companies from training knowledge | Comparables must come from Tavily search results (Step 6). AI knowledge of companies is not evidence -- a company suggested from memory may have wrong funding status or may not be a true comparable. |
| Continue when all 5 Track A searches return 0 results | Zero Track A results means the comparables were wrong or too obscure. Stop, re-run Step 6 with broader search queries, and retry. |
| Include a Track B VC without citing the article or post | Thesis without a source is indistinguishable from hallucination. The founder cannot verify it and the list loses all credibility. |
| Fill in fund overview from training knowledge | Fund overviews must come from Tavily snippet text only. If the snippets don't describe the fund, write "not found in search data". |
| Detect stage from website aesthetics | Stage must come from the specific CTA signals detected in Step 4. |
| Write generic outreach hooks | Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal or thesis quote from the search data. |
| Skip the URL fetch when the user also provides a description | Always fetch the URL. The live page often reveals stage signals that the user's description omits. |
echo "TAVILY_API_KEY: ${TAVILY_API_KEY:+set}"
echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-not set, Tavily extract will be used as fallback}"
If TAVILY_API_KEY is missing: Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com -- free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."
If only FIRECRAWL_API_KEY is missing: Continue silently. Tavily extract will be used for the URL fetch.
You need:
If the user provides only a pasted description (no URL): Skip Steps 3-4. Go directly to Step 5 with the pasted text as product_content. Set stage_source to user_description.
If neither URL nor description is provided: Ask: "What is the URL of your product or startup? Or paste a short description: what it does, who it is for, and what stage you are at (pre-seed, seed, Series A)."
Derive product slug from URL for the output filename:
PRODUCT_SLUG=$(python3 -c "
from urllib.parse import urlparse
url = 'URL_HERE'
host = urlparse(url).netloc.replace('www.', '')
print(host.split('.')[0])
")
Primary: Firecrawl (if FIRECRAWL_API_KEY is set)
curl -s -X POST https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "URL_HERE", "formats": ["markdown"], "onlyMainContent": true}' \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
content = d.get('data', {}).get('markdown', '') or d.get('markdown', '')
print(f'Fetched: {len(content)} characters')
open('/tmp/vc-product-raw.md', 'w').write(content)
"
Fallback: Tavily extract (if FIRECRAWL_API_KEY is not set)
curl -s -X POST https://api.tavily.com/extract \
-H "Content-Type: application/json" \
-d "{\"api_key\": \"$TAVILY_API_KEY\", \"urls\": [\"URL_HERE\"]}" \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
content = d.get('results', [{}])[0].get('raw_content', '')
print(f'Fetched via Tavily extract: {len(content)} characters')
open('/tmp/vc-product-raw.md', 'w').write(content)
"
Step-level checkpoint:
python3 -c "
content = open('/tmp/vc-product-raw.md').read()
if len(content) < 200:
print('ERROR: Page returned fewer than 200 characters.')
else:
print(f'Content OK: {len(content)} characters')
"
If content < 200 characters: Stop fetching. Tell the user: "The product page returned no readable content. This usually means the site is JavaScript-rendered and requires a browser. Please paste your product description directly: what it does, who it is for, and what stage you are at."
Proceed to Step 5 using the pasted description as product_content.
Parse the fetched markdown with regex before the analysis step.
python3 << 'PYEOF'
import re, json
content = open('/tmp/vc-product-raw.md').read().lower()
stage_signals = []
if re.search(r'join\s+(the\s+)?waitlist|sign\s+up\s+for\s+beta|early\s+access|request\s+(an?\s+)?invite|get\s+notified', content):
stage_signals.append({'signal': 'waitlist or beta CTA', 'stage_hint': 'pre-seed'})
if re.search(r'start\s+(your\s+)?free\s+trial|try\s+(it\s+)?for\s+free|request\s+a?\s+demo|book\s+a?\s+demo|schedule\s+a?\s+demo', content):
stage_signals.append({'signal': 'free trial or demo CTA', 'stage_hint': 'seed'})
if re.search(r'contact\s+sales|talk\s+to\s+(our\s+)?sales|see\s+pricing|view\s+pricing|plans\s+and\s+pricing', content):
stage_signals.append({'signal': 'pricing or sales CTA', 'stage_hint': 'series-a'})
if re.search(r'case\s+stud(y|ies)|customer\s+stor(y|ies)|trusted\s+by\s+[\d,]+|used\s+by\s+[\d,]+', content):
stage_signals.append({'signal': 'case studies or customer count', 'stage_hint': 'series-a'})
if re.search(r'enterprise\s+(plan|pricing|tier)|we.?re\s+hiring|join\s+our\s+team|open\s+positions', content):
stage_signals.append({'signal': 'enterprise tier or job openings', 'stage_hint': 'series-a-or-b'})
funding_match = re.search(
r'raised\s+\$[\d,.]+\s*[mk]?|series\s+[abc]\s+round|seed\s+round|(\$[\d,.]+\s*[mk]?\s+(?:seed|series\s+[abc]))',
content
)
if funding_match:
stage_signals.append({'signal': f'funding text: {funding_match.group(0).strip()}', 'stage_hint': 'announced'})
if not stage_signals:
dominant = 'unknown'
elif any(s['stage_hint'] == 'announced' for s in stage_signals):
dominant = 'announced'
elif any(s['stage_hint'] == 'series-a-or-b' for s in stage_signals):
dominant = 'series-a'
elif any(s['stage_hint'] == 'series-a' for s in stage_signals):
dominant = 'series-a'
elif any(s['stage_hint'] == 'seed' for s in stage_signals):
dominant = 'seed'
else:
dominant = 'pre-seed'
confidence = 'high' if len(stage_signals) >= 2 else ('medium' if len(stage_signals) == 1 else 'low')
result = {'signals': stage_signals, 'dominant_stage': dominant, 'confidence': confidence}
json.dump(result, open('/tmp/vc-stage-signals.json', 'w'), indent=2)
print(f'Stage: {dominant} ({confidence} confidence) from {len(stage_signals)} signal(s)')
for s in stage_signals:
print(f' - {s["signal"]} -> {s["stage_hint"]}')
PYEOF
Print the product content and stage signals:
python3 -c "
import json
content = open('/tmp/vc-product-raw.md').read()[:6000]
signals = json.load(open('/tmp/vc-stage-signals.json'))
print('=== PRODUCT PAGE (first 6000 chars) ===')
print(content)
print()
print('=== DETECTED STAGE SIGNALS ===')
print(json.dumps(signals, indent=2))
"
AI instructions: Analyze the product page content above. Generate the taxonomy, ICP, and stage classification only -- do NOT generate comparable companies yet (that is done via live search in Step 6).
Write to /tmp/vc-product-analysis.json:
product_name: from the pageone_line_description: what it does, for whom, core value prop. Under 20 words. No marketing language.industry_taxonomy: l1 (top-level: fintech / healthtech / developer tools / consumer / etc.), l2 (sector: sales technology / logistics software / etc.), l3 (specific niche: outbound prospecting / last-mile routing / etc.). Vague labels like "technology" or "software" alone are not acceptable.icp: buyer_persona (job title), company_type, company_sizedetected_stage: pre-seed / seed / series-a / series-b / unknownstage_confidence: high / medium / lowstage_evidence: one sentence citing exactly which CTA or text on the page drove this. Write "no clear signals found" if unknown.geography_bias: US / Europe / global / unclearcomparable_companies: leave as empty array [] -- will be filled in Step 6python3 << 'PYEOF'
import json
analysis = {
# FILL from your analysis above
"comparable_companies": []
}
json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
print('Product analysis written.')
PYEOF
Verify:
python3 -c "
import json
a = json.load(open('/tmp/vc-product-analysis.json'))
print('Product:', a['product_name'])
print('Industry:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
print('Stage:', a['detected_stage'], '(' + a['stage_confidence'] + ' confidence)')
"
Run the product taxonomy against a curated dataset of 25 verified VC funds (sourced from fund websites). Produces zero-hallucination fund matches and seed comparables for Track A -- no Tavily credits consumed.
Print product analysis for tag mapping:
python3 -c "
import json
a = json.load(open('/tmp/vc-product-analysis.json'))
print('Taxonomy:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
print('Stage:', a['detected_stage'])
print('Geography:', a['geography_bias'])
"
AI instructions: Map the product taxonomy to the standard tags used in the fund dataset. Available tags:
DevTools, Infrastructure, Open Source, B2B SaaS, AI, Data, FinTech, HealthTech, Enterprise, Consumer, Marketplaces, E-commerce, Crypto, DeepTech, Cybersecurity, Generalist
Pick 2-4 tags that describe this product. Map detected_stage to: Pre-seed, Seed, Series A, or Growth. Map geography_bias to: US, Europe, India, or Global.
Write product context:
python3 << 'PYEOF'
import json
# FILL based on taxonomy analysis above
context = {
"extracted_tags": ["TagA", "TagB"], # 2-4 tags from the list above
"stage_hint": "Seed", # Pre-seed / Seed / Series A / Growth
"geography_hint": "US" # US / Europe / India / Global
}
json.dump(context, open('/tmp/vc-product-context.json', 'w'), indent=2)
print('Product context:', context)
PYEOF
Run scoring against the embedded curated dataset:
python3 << 'PYEOF'
import json
context = json.load(open('/tmp/vc-product-context.json'))
VC_FUNDS = [
{"fund_name":"Y Combinator","thesis":"We provide seed funding for startups. We invest in deeply technical teams building massive companies across all domains.","check_size":"$500k","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","DevTools","AI"],"geography_focus":["Global"],"notable_portfolio":["Stripe","Airbnb","GitLab"],"website":"https://www.ycombinator.com"},
{"fund_name":"boldstart ventures","thesis":"Day one partner for developer first, crypto, and SaaS founders. We love deeply technical founders solving hard infrastructure problems.","check_size":"$1M - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["DevTools","Infrastructure","Crypto"],"geography_focus":["Global","US"],"notable_portfolio":["Snyk","Blockdaemon","Superhuman"],"website":"https://boldstart.vc"},
{"fund_name":"Heavybit","thesis":"The leading investor in developer-first startups. We help technical founders launch, gain traction, and build enterprise-ready companies.","check_size":"$1M - $5M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","Open Source"],"geography_focus":["Global","US"],"notable_portfolio":["PagerDuty","Sanity","Netlify"],"website":"https://www.heavybit.com"},
{"fund_name":"Amplify Partners","thesis":"We invest in technical founders building the next generation of IT infrastructure, developer tools, and data platforms.","check_size":"$2M - $8M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","AI","Data"],"geography_focus":["US"],"notable_portfolio":["Datadog","OCTO","dbt Labs"],"website":"https://www.amplifypartners.com"},
{"fund_name":"OSS Capital","thesis":"We exclusively back early-stage founders building Commercial Open Source Software (COSS) companies.","check_size":"$500k - $2M","stage_focus":["Pre-seed","Seed","Series A"],"industry_tags":["Open Source","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Cal.com","Appsmith","Hoppscotch"],"website":"https://oss.capital"},
{"fund_name":"Sequoia Capital","thesis":"We help the daring build legendary companies, from idea to IPO and beyond. Sequoia is an early-stage and growth-stage investor.","check_size":"$1M - $10M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","AI"],"geography_focus":["Global"],"notable_portfolio":["Apple","Google","WhatsApp"],"website":"https://www.sequoiacap.com"},
{"fund_name":"Andreessen Horowitz (a16z)","thesis":"We invest in software eating the world. We back bold entrepreneurs building the future through technology.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Crypto","Enterprise","Consumer","AI"],"geography_focus":["Global","US"],"notable_portfolio":["Facebook","Coinbase","Figma"],"website":"https://a16z.com"},
{"fund_name":"Point Nine Capital","thesis":"We are a seed-stage venture capital firm focused on B2B SaaS and B2B marketplaces globally.","check_size":"$1M - $3M","stage_focus":["Seed"],"industry_tags":["B2B SaaS","Marketplaces"],"geography_focus":["Europe","Global"],"notable_portfolio":["Zendesk","Typeform","Docplanner"],"website":"https://www.pointnine.com"},
{"fund_name":"Cherry Ventures","thesis":"We champion founders in Europe from their earliest days. We are generalist seed investors.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["Europe"],"notable_portfolio":["FlixBus","Auto1 Group","Forto"],"website":"https://www.cherry.vc"},
{"fund_name":"First Round Capital","thesis":"We are the seed-stage firm that builds the most supportive community for founders.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer"],"geography_focus":["US"],"notable_portfolio":["Uber","Notion","Roblox"],"website":"https://firstround.com"},
{"fund_name":"Bessemer Venture Partners","thesis":"BVP helps entrepreneurs lay strong foundations to build and forge long-standing companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["LinkedIn","Twilio","Shopify"],"website":"https://www.bvp.com"},
{"fund_name":"Index Ventures","thesis":"We back the best and most ambitious entrepreneurs across all stages to build category-defining businesses.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","FinTech","Consumer","B2B SaaS"],"geography_focus":["Europe","US","Global"],"notable_portfolio":["Dropbox","Slack","Figma"],"website":"https://www.indexventures.com"},
{"fund_name":"Lightspeed Venture Partners","thesis":"We invest globally in enterprise, consumer, and health founders who are shaping the future.","check_size":"$1M - $25M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["Snap","Rippling","MuleSoft"],"website":"https://lsvp.com"},
{"fund_name":"Accel","thesis":"We partner with exceptional founders from inception through all phases of private company growth.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Facebook","Atlassian","Spotify"],"website":"https://www.accel.com"},
{"fund_name":"Bain Capital Ventures","thesis":"From seed to growth, we back founders building legendary infrastructure, fintech, application, and commerce companies.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Infrastructure","FinTech","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["DocuSign","SendGrid","Redis"],"website":"https://www.baincapitalventures.com"},
{"fund_name":"Greylock Partners","thesis":"We partner with early-stage founders to build enterprise and consumer software companies that define new categories.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Enterprise","Consumer","Cybersecurity","AI"],"geography_focus":["US"],"notable_portfolio":["Workday","Palo Alto Networks","LinkedIn"],"website":"https://greylock.com"},
{"fund_name":"Unusual Ventures","thesis":"We provide a breakthrough level of support for early-stage founders building enterprise tech.","check_size":"$1M - $5M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Enterprise","DevTools","B2B SaaS"],"geography_focus":["US"],"notable_portfolio":["Arctic Wolf","Harness","Vivun"],"website":"https://www.unusual.vc"},
{"fund_name":"Crane Venture Partners","thesis":"We back deep tech and enterprise founders in Europe solving hard problems with data and code.","check_size":"$1M - $4M","stage_focus":["Seed"],"industry_tags":["Enterprise","DeepTech","Data","AI"],"geography_focus":["Europe"],"notable_portfolio":["Onfido","Tessian","Forto"],"website":"https://crane.vc"},
{"fund_name":"Founder Collective","thesis":"We are a seed-stage venture capital fund, built by founders, for founders. We back weird, wonderful, and wild startups.","check_size":"$500k - $2M","stage_focus":["Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Airtable","BuzzFeed"],"website":"https://www.foundercollective.com"},
{"fund_name":"Benchmark","thesis":"We are a partnership of equal partners. We back mission-driven founders at the earliest stages and walk beside them for the long haul.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Marketplaces","Enterprise","Consumer"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Twitter","eBay","Snapchat"],"website":"https://www.benchmark.com"},
{"fund_name":"Accel India","thesis":"We partner with exceptional founders from inception through all phases of private company growth in the Indian ecosystem.","check_size":"$1M - $15M","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","FinTech","E-commerce"],"geography_focus":["India"],"notable_portfolio":["Flipkart","Swiggy","Freshworks"],"website":"https://www.accel.com/india"},
{"fund_name":"Blume Ventures","thesis":"We are a seed and pre-seed venture fund that backs startups with both funding and active mentoring.","check_size":"$500k - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer","DeepTech","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Unacademy","Purplle","GreyOrange"],"website":"https://blume.vc"},
{"fund_name":"Elevation Capital","thesis":"We partner with visionary founders in India across early stages to help them build category-defining businesses.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Paytm","Swiggy","Meesho"],"website":"https://elevationcapital.com"},
{"fund_name":"Peak XV Partners","thesis":"Formerly Sequoia India & SEA, we partner with founders across early, growth, and public stages to build enduring companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","DevTools","AI"],"geography_focus":["India","South Asia"],"notable_portfolio":["Zomato","Pine Labs","Cred"],"website":"https://www.peakxv.com"},
{"fund_name":"Nexus Venture Partners","thesis":"We are a US-India venture capital firm backing extraordinary founders building product-first companies.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["B2B SaaS","Enterprise","DevTools","Consumer"],"geography_focus":["India","US"],"notable_portfolio":["Postman","Hasura","Zepto"],"website":"https://nexusvp.com"}
]
STAGE_ORDER = {"Pre-seed": 0, "Seed": 1, "Series A": 2, "Growth": 3}
def score_fund(fund, ctx):
score = 0
fund_tags = fund.get("industry_tags", [])
extracted_tags = ctx.get("extracted_tags", ["Generalist"])
tag_points = 0
matched_tags = []
for tag in extracted_tags:
if tag in fund_tags:
tag_points += 5 if tag == "Generalist" else 20
matched_tags.append(tag)
tag_points = min(tag_points, 60)
score += tag_points
stage_hint = ctx.get("stage_hint")
fund_stages = fund.get("stage_focus", [])
if not stage_hint:
score += 10
elif fund_stages:
if stage_hint in fund_stages:
score += 20
elif stage_hint in STAGE_ORDER:
hint_idx = STAGE_ORDER[stage_hint]
if any(f in STAGE_ORDER and abs(STAGE_ORDER[f] - hint_idx) == 1 for f in fund_stages):
score += 10
geo_hint = ctx.get("geography_hint")
fund_geo = fund.get("geography_focus", ["Global"])
if not geo_hint or geo_hint == "Global":
score += 10
elif fund_geo == ["India"] and geo_hint == "US":
pass
elif geo_hint in fund_geo:
score += 20
elif "Global" in fund_geo:
score += 15
if geo_hint == "US" and "India" in fund_geo and "US" not in fund_geo and "Global" not in fund_geo:
score = max(0, score - 30)
if fund_tags and extracted_tags and fund_tags[0] not in extracted_tags and tag_points <= 20:
score = max(0, score - 15)
return score, matched_tags
scored = []
for fund in VC_FUNDS:
score, matched_tags = score_fund(fund, context)
tier = "High" if score >= 70 else ("Medium" if score >= 40 else "Low")
scored.append({
"fund_name": fund["fund_name"],
"thesis": fund["thesis"],
"check_size": fund["check_size"],
"stage_focus": fund["stage_focus"],
"industry_tags": fund["industry_tags"],
"geography_focus": fund["geography_focus"],
"notable_portfolio": fund["notable_portfolio"],
"website": fund["website"],
"source": "verified (fund website)",
"score": score,
"confidence": tier,
"matched_tags": matched_tags
})
scored.sort(key=lambda x: (-x["score"], x["fund_name"]))
relevant = [m for m in scored if m["confidence"] in ("High", "Medium")]
curated_comparables = []
for m in relevant:
for company in m.get("notable_portfolio", []):
if company not in curated_comparables:
curated_comparables.append(company)
output = {
"high_medium_matches": relevant,
"curated_comparables": curated_comparables[:6]
}
json.dump(output, open('/tmp/vc-curated-matches.json', 'w'), indent=2)
print(f'Curated matches: {len(relevant)} High/Medium confidence funds')
for m in relevant[:8]:
print(f' {m["confidence"]:6} ({m["score"]:3}) {m["fund_name"]}')
print(f'Seed comparables from portfolio: {curated_comparables[:6]}')
PYEOF
Load curated portfolio companies from Step 5b as seed comparables:
python3 -c "
import json
matches = json.load(open('/tmp/vc-curated-matches.json'))
curated = matches.get('curated_comparables', [])
print(f'Curated portfolio comparables ({len(curated)}): {curated}')
need = max(0, 5 - len(curated))
print(f'Tavily will supplement with up to {need} more')
"
Do not use AI training knowledge to generate comparable companies. Curated portfolio companies (above) are already zero-hallucination comparables from verified fund data. Tavily supplements with L3-niche-specific companies.
python3 << 'PYEOF'
import json, os, urllib.request
analysis = json.load(open('/tmp/vc-product-analysis.json'))
l2 = analysis['industry_taxonomy']['l2']
l3 = analysis['industry_taxonomy']['l3']
tavily_key = os.environ.get('TAVILY_API_KEY', '')
queries = [
f'"{l3}" startup raised funding venture capital seed series',
f'"{l2}" companies venture backed funded startup'
]
all_results = []
for query in queries:
payload = json.dumps({
"api_key": tavily_key,
"query": query,
"search_depth": "advanced",
"max_results": 8,
"include_answer": True
}).encode()
req = urllib.request.Request(
'https://api.tavily.com/search',
data=payload,
headers={'Content-Type': 'application/json'},
method='POST'
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
result = json.loads(resp.read())
all_results.append({
'query': query,
'answer': result.get('answer', ''),
'results': [
{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:500]}
for r in result.get('results', [])
]
})
print(f'Comparable search: {len(result.get("results", []))} results for "{query[:60]}"')
except Exception as e:
print(f'Comparable search FAILED: {e}')
all_results.append({'query': query, 'answer': '', 'results': [], 'error': str(e)})
json.dump(all_results, open('/tmp/vc-comparable-search.json', 'w'), indent=2)
PYEOF
Print results for AI selection:
python3 -c "
import json
results = json.load(open('/tmp/vc-comparable-search.json'))
for r in results:
print(f'Query: {r[\"query\"]}')
print(f'Answer: {r.get(\"answer\",\"\")[:400]}')
for item in r.get('results', []):
print(f' - {item[\"title\"]} | {item[\"url\"]}')
print(f' {item[\"content\"][:200]}')
print()
"
AI instructions: Combine the curated portfolio companies from /tmp/vc-curated-matches.json with the Tavily search results above. Pick exactly 5 comparable companies. Prioritize curated portfolio companies (already verified -- they are real portfolio companies of matched VC funds). Supplement with Tavily-discovered companies to reach 5 if needed.
For each comparable write:
name: company namesimilarity_reason: one sentence explaining the fit (for curated: reference the fund that backed them; for Tavily: cite the snippet)source_url: portfolio fund website for curated companies, Tavily result URL for discovered onesestimated_stage: from curated data or snippet text -- write "not in search data" if unknownsource_type: "curated_portfolio" or "tavily_discovered"Update /tmp/vc-product-analysis.json with the comparable_companies array:
python3 << 'PYEOF'
import json
analysis = json.load(open('/tmp/vc-product-analysis.json'))
analysis['comparable_companies'] = [
# FILL 5 companies -- curated_portfolio first, then tavily_discovered
# Each: {"name": str, "similarity_reason": str, "source_url": str, "estimated_stage": str, "source_type": str}
]
json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
print('Comparables written:', ', '.join(c['name'] for c in analysis['comparable_companies']))
PYEOF
If fewer than 3 comparable companies appear in the search results: Broaden the queries. Run a third search: "[l1] startup" funding round venture capital. If still thin, proceed with what is available and flag in data_quality_flags.
Run 5 Tavily searches, one per comparable.
python3 << 'PYEOF'
import json, os, urllib.request
analysis = json.load(open('/tmp/vc-product-analysis.json'))
comparables = analysis['comparable_companies']
tavily_key = os.environ.get('TAVILY_API_KEY', '')
all_track_a = []
for comp in comparables:
company = comp['name']
query = f'"{company}" investors funding venture capital backed seed series'
payload = json.dumps({
"api_key": tavily_key,
"query": query,
"search_depth": "advanced",
"max_results": 5,
"include_answer": True
}).encode()
req = urllib.request.Request(
'https://api.tavily.com/search',
data=payload,
headers={'Content-Type': 'application/json'},
method='POST'
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
result = json.loads(resp.read())
all_track_a.append({
'comparable_company': company,
'similarity_reason': comp['similarity_reason'],
'query': query,
'answer': result.get('answer', ''),
'results': result.get('results', [])
})
print(f'Track A - {company}: {len(result.get("results", []))} results')
except Exception as e:
print(f'Track A - {company}: FAILED ({e})')
all_track_a.append({
'comparable_company': company,
'similarity_reason': comp['similarity_reason'],
'query': query,
'answer': '',
'results': [],
'error': str(e)
})
json.dump(all_track_a, open('/tmp/vc-tracka-results.json', 'w'), indent=2)
print(f'Track A complete. Comparables with results: {sum(1 for r in all_track_a if r.get("results"))}')
PYEOF
If all 5 Track A searches return 0 results: Re-run Step 6 with broader queries. Retry with well-covered companies (those with significant press coverage). If still 0: proceed to Track B only and flag in data_quality_flags.
Run 3 Tavily searches using L2 and L3 taxonomy from Step 5.
python3 << 'PYEOF'
import json, os, urllib.request
analysis = json.load(open('/tmp/vc-product-analysis.json'))
l2 = analysis['industry_taxonomy']['l2']
l3 = analysis['industry_taxonomy']['l3']
stage = analysis['detected_stage']
tavily_key = os.environ.get('TAVILY_API_KEY', '')
queries = [
{'name': 'thesis_l3', 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'},
{'name': 'thesis_l2', 'query': f'VC fund "{l2}" investment thesis portfolio companies'},
{'name': 'stage_space', 'query': f'{stage} investors "{l3}" startup venture capital fund'}
]
all_track_b = []
for q in queries:
payload = json.dumps({
"api_key": tavily_key,
"query": q['query'],
"search_depth": "advanced",
"max_results": 7,
"include_answer": True
}).encode()
req = urllib.request.Request(
'https://api.tavily.com/search',
data=payload,
headers={'Content-Type': 'application/json'},
method='POST'
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
result = json.loads(resp.read())
all_track_b.append({
'query_name': q['name'],
'query': q['query'],
'answer': result.get('answer', ''),
'results': result.get('results', [])
})
print(f"Track B - {q['name']}: {len(result.get('results', []))} results")
except Exception as e:
print(f"Track B - {q['name']}: FAILED ({e})")
all_track_b.append({'query_name': q['name'], 'query': q['query'], 'answer': '', 'results': [], 'error': str(e)})
json.dump(all_track_b, open('/tmp/vc-trackb-results.json', 'w'), indent=2)
PYEOF
If all 3 Track B searches return 0 results: Proceed with Track A results only. Note in data_quality_flags: "No thesis-led investors found via public search."
Print the research data:
python3 -c "
import json
analysis = json.load(open('/tmp/vc-product-analysis.json'))
track_a = json.load(open('/tmp/vc-tracka-results.json'))
track_b = json.load(open('/tmp/vc-trackb-results.json'))
curated = json.load(open('/tmp/vc-curated-matches.json'))
track_a_summary = []
for item in track_a:
snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
for r in item.get('results', [])[:3]]
track_a_summary.append({
'comparable_company': item['comparable_company'],
'similarity_reason': item['similarity_reason'],
'answer': item.get('answer', '')[:500],
'top_results': snippets
})
track_b_summary = []
for item in track_b:
snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
for r in item.get('results', [])[:4]]
track_b_summary.append({
'query_name': item['query_name'],
'answer': item.get('answer', '')[:500],
'top_results': snippets
})
curated_summary = []
for m in curated.get('high_medium_matches', []):
curated_summary.append({
'fund_name': m['fund_name'],
'confidence': m['confidence'],
'score': m['score'],
'matched_tags': m['matched_tags'],
'thesis': m['thesis'],
'check_size': m['check_size'],
'stage_focus': m['stage_focus'],
'notable_portfolio': m['notable_portfolio'],
'website': m['website'],
'source': 'verified (fund website)'
})
print(json.dumps({
'product': {
'name': analysis['product_name'],
'description': analysis['one_line_description'],
'industry': analysis['industry_taxonomy'],
'icp': analysis['icp'],
'stage': analysis['detected_stage'],
'stage_confidence': analysis['stage_confidence'],
'geography': analysis['geography_bias']
},
'curated_matches': curated_summary,
'track_a_research': track_a_summary,
'track_b_research': track_b_summary
}, indent=2))
"
AI instructions -- zero-hallucination rules:
Every field in the output must be traceable to the printed data above. Rules:
curated_matches data directly. These are pre-verified -- no Tavily evidence required. fund_overview comes from the thesis field in the curated data. check_size and stage_focus come from the curated data fields. Do NOT fill from training knowledge even for these funds.Write to /tmp/vc-final-list.json:
product_summary: name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (names only)curated_vcs: fund_name, confidence ("High"/"Medium"), matched_tags, fund_overview (from thesis field), check_size, stage_focus, website, source ("verified (fund website)"), stage_fit_score, space_fit_scoretrack_a_vcs: fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_methodtrack_b_vcs: fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_methodtop_5_deep_dives: fund_name, track ("Curated"/"A"/"B"), fund_overview, why_fit, portfolio_in_space, how_to_approach (min 30 chars), outreach_hookoutreach_hooks: 3 objects -- hook_type, hook_text (2-3 sentences), best_fordata_quality_flags: gaps, missing fields, low-confidence areaspython3 << 'PYEOF'
import json
result = {
# FILL from synthesis above
# Must include: product_summary, curated_vcs, track_a_vcs, track_b_vcs, top_5_deep_dives, outreach_hooks, data_quality_flags
}
json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
print(f'Synthesis written. Curated: {len(result.get("curated_vcs", []))} VCs. Track A: {len(result.get("track_a_vcs", []))} VCs. Track B: {len(result.get("track_b_vcs", []))} VCs.')
PYEOF
python3 << 'PYEOF'
import json
result = json.load(open('/tmp/vc-final-list.json'))
failures = []
# Remove Track A VCs missing evidence_company
original_a = len(result.get('track_a_vcs', []))
result['track_a_vcs'] = [v for v in result.get('track_a_vcs', []) if v.get('evidence_company')]
removed_a = original_a - len(result['track_a_vcs'])
if removed_a > 0:
failures.append(f'Removed {removed_a} Track A VC(s) missing evidence_company')
# Remove Track B VCs missing thesis_source_title
original_b = len(result.get('track_b_vcs', []))
result['track_b_vcs'] = [v for v in result.get('track_b_vcs', []) if v.get('thesis_source_title')]
removed_b = original_b - len(result['track_b_vcs'])
if removed_b > 0:
failures.append(f'Removed {removed_b} Track B VC(s) missing thesis_source_title')
# Remove deep dives for VCs that were stripped from all tracks
valid_funds = (
{v['fund_name'] for v in result.get('curated_vcs', [])} |
{v['fund_name'] for v in result.get('track_a_vcs', [])} |
{v['fund_name'] for v in result.get('track_b_vcs', [])}
)
original_dives = len(result.get('top_5_deep_dives', []))
result['top_5_deep_dives'] = [d for d in result.get('top_5_deep_dives', []) if d.get('fund_name') in valid_funds]
removed_dives = original_dives - len(result['top_5_deep_dives'])
if removed_dives > 0:
failures.append(f'Removed {removed_dives} deep dive(s) for funds stripped during QA')
# Check top 5 deep dives
dives = result.get('top_5_deep_dives', [])
if len(dives) < 5:
failures.append(f'Only {len(dives)} deep dives (expected 5) -- insufficient search data')
for dd in dives:
if not dd.get('how_to_approach') or len(dd.get('how_to_approach', '')) < 30:
dd['how_to_approach'] = 'Approach method not determinable from search data. Check the fund website directly for application instructions.'
failures.append(f"Fixed: '{dd.get('fund_name')}' had missing how_to_approach")
if not dd.get('fund_overview') or dd.get('fund_overview') == '':
dd['fund_overview'] = 'not found in search data'
# Check outreach hooks count
if len(result.get('outreach_hooks', [])) != 3:
failures.append(f"Expected 3 outreach hooks, got {len(result.get('outreach_hooks', []))}")
# Check for em dashes
full_text = json.dumps(result)
if '—' in full_text:
result = json.loads(full_text.replace('—', '-'))
failures.append('Fixed: em dash characters replaced with hyphens')
# Check for forbidden words
forbidden = ['powerful', 'robust', 'seamless', 'innovative', 'game-changing', 'streamline', 'leverage', 'transform']
full_text_lower = json.dumps(result).lower()
for word in forbidden:
if word in full_text_lower:
failures.append(f"Warning: forbidden word '{word}' found in output -- review before presenting")
# Flag any "not found in search data" entries so user knows coverage is incomplete
not_found_count = json.dumps(result).count('not found in search data')
if not_found_count > 0:
failures.append(f'INFO: {not_found_count} field(s) marked "not found in search data" -- verify directly before outreach')
if 'data_quality_flags' not in result:
result['data_quality_flags'] = []
result['data_quality_flags'].extend(failures)
json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
print(f'QA complete. Issues addressed: {len(failures)}')
for f in failures:
print(f' - {f}')
if not failures:
print('All QA checks passed.')
PYEOF
DATE=$(date +%Y-%m-%d)
OUTPUT_FILE="docs/vc-intel/${PRODUCT_SLUG}-${DATE}.md"
mkdir -p docs/vc-intel
Present the final output:
## VC Finder: [product_name]
Date: [today] | Stage: [detected_stage] ([stage_confidence] confidence) | Geography: [geography_bias]
---
### Product Analysis
What it does: [one_line_description]
Industry: [l1] > [l2] > [l3]
Buyer: [buyer_persona] at [company_type], [company_size]
Comparable companies used: [comma-separated list, noting source_type for each]
---
### Curated Matches (Verified)
*Funds matched from a verified dataset of 25 VC funds sourced from fund websites. Zero hallucination -- details come directly from the dataset.*
| Fund | Confidence | Stage Focus | Check Size | Matched Tags |
|---|---|---|---|---|
[one row per curated VC, sorted by confidence then score]
---
### Track A: VCs Who Backed Similar Companies
*These investors have already written a check in this space. Evidence from live Tavily search.*
| Fund | Backed Comparable | Stage Focus | Check Size | Fit Score | Approach |
|---|---|---|---|---|---|
[one row per Track A VC, sorted by space_fit_score descending]
---
### Track B: Thesis-Led Investors
*These investors are actively publishing about this space.*
| Fund | Thesis Source | Stage Focus | Check Size | Fit Score | Approach |
|---|---|---|---|---|---|
[one row per Track B VC, sorted by space_fit_score descending]
---
### Top 5 Deep Dives
#### [N]. [Fund Name] (Track [Curated/A/B])
Overview: [fund_overview -- from dataset or search data only]
Why it fits: [why_fit]
Portfolio in this space: [from dataset or search data, or "not found in search data"]
How to approach: [how_to_approach]
Outreach hook: "[outreach_hook]"
[repeat for all available deep dives]
---
### 3 Outreach Hooks for This Product Type
**1. [hook_type]**
[hook_text]
Best for: [best_for]
[repeat for all 3]
---
Data quality notes: [data_quality_flags, or "None"]
Saved to: docs/vc-intel/[PRODUCT_SLUG]-[DATE].md
Clean up temp files:
rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-product-analysis.json \
/tmp/vc-product-context.json /tmp/vc-curated-matches.json /tmp/vc-comparable-search.json \
/tmp/vc-tracka-results.json /tmp/vc-trackb-results.json /tmp/vc-final-list.json