Help us improve
Share bugs, ideas, or general feedback.
From awesome-skills
Scores link prospects by Ahrefs authority, identifies pitch angles, generates outreach emails, and proposes in-article link placements. Use for SEO link building.
npx claudepluginhub apify/awesome-skills --plugin apify-influencer-brand-collabsHow this skill is triggered — by the user, by Claude, or both
Slash command
/awesome-skills:apify-link-prospecting-outreachThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Turn a goal + a target keyword + a URL the user wants to promote into a tiered, ready-to-send outreach list: SERP-ranking prospects with Ahrefs-scored authority, the strongest pitch angle per prospect, an outreach-type-matched email draft, and a copy-paste-ready link placement.
campaign.json.exampleexamples/example-email-abc.mdexamples/example-email-resource-page.mdexamples/example-input.mdreference/apify-actor-usage.mdreference/email-templates.mdreference/output-formats.mdreference/troubleshooting.mdscripts/build_prospects.pyscripts/enrich_prospects.pyscripts/fetch_run_artifacts.jsscripts/fetch_subactors_from_log.jsscripts/merge_subagent_outputs.pyscripts/package.jsonscripts/run_actor.jsscripts/write_xlsx.pyFinds referring domains linking to competitors but not to you, ranked by outreach-priority score combining DR, link-overlap, traffic, and topical relevance. Use for link-building campaigns, digital-PR qualification, or quarterly backlink-gap audits.
Designs sustainable link acquisition strategies to improve domain authority and organic search rankings through ethical content-driven outreach.
Analyzes competitor backlinks, identifies toxic links, discovers unlinked brand mentions, and generates outreach plans and monthly SEO reports.
Share bugs, ideas, or general feedback.
Turn a goal + a target keyword + a URL the user wants to promote into a tiered, ready-to-send outreach list: SERP-ranking prospects with Ahrefs-scored authority, the strongest pitch angle per prospect, an outreach-type-matched email draft, and a copy-paste-ready link placement.
(No need to check it upfront)
.env file with APIFY_TOKENmcp__claude_ai_Ahrefs__* tools for prospect scoring)--env-file support)scripts/ folder: npm installAfter Step 1–2 inputs are collected, write them to a single campaign.json (schema in campaign.json.example). Every downstream script reads --config campaign.json, so the agent doesn't fork per-campaign copies. Sequence:
# 1. Run the Actor (writes {base}.json + sub-Actor sidecars when --fetch-sub-datasets)
node --env-file=.env scripts/run_actor.js --actor "apify/link-prospecting-tool" --input '<json>' --timeout 1800 --fetch-sub-datasets --output {base}.json --format json
# 2. Build unified prospect table from the sidecars
python3 scripts/build_prospects.py --config campaign.json
# 3. (After Step 5 Ahrefs MCP calls → save to {base}_ahrefs_domain.json + {base}_ahrefs_page.json)
python3 scripts/enrich_prospects.py --config campaign.json
# 4. (After Step 8 sub-agents write outputs to /tmp/placement_outputs/row_*.json)
python3 scripts/merge_subagent_outputs.py --config campaign.json --outputs-dir /tmp/placement_outputs
# 5. Write the final xlsx + metadata sidecar
python3 scripts/write_xlsx.py --config campaign.json
If the runner's client-side wait elapses with the Actor still running on Apify, use scripts/fetch_run_artifacts.js --run-id <id> --output {base}.json instead of restarting. If the parent run is missing SUB_ACTOR_RESULTS (post-2026-05-20 Actor schema), scripts/fetch_subactors_from_log.js resolves sub-Actor runIds from the parent log.
Copy this checklist and track progress:
Task Progress:
- [ ] Step 1: Collect required anchor inputs incl. goal (block on these)
- [ ] Step 2: Collect brand voice, partnership type, output format
- [ ] Step 3: Run apify/link-prospecting-tool
- [ ] Step 4: Pull leads, mentions, authors, and sub-Actor datasets
- [ ] Step 5: Enrich every domain with Ahrefs metrics, assign Prospect Tier
- [ ] Step 6: Run skip pass — flag rows to drop before drafting
- [ ] Step 7: Compute "Why This Prospect" tag per surviving row
- [ ] Step 8: Compose per-row 3-artifact placement + outreach-type-aware email
- [ ] Step 9: Render output in chosen format
Do NOT proceed to Step 2 until every required input is answered. Surface them as the very first interaction. The dedup input (#7) is optional but must still be explicitly asked.
Concrete goal for this campaign — pick one preset or supply custom text. The goal drives skip-pass filtering, outreach-type template selection, and Prospect Tier thresholds. Required.
| Preset | Effect downstream |
|---|---|
Recover unlinked brand mentions | Skip pass drops every row where brand_mentioned_in_source is false. Default outreach type = unlinked-mention-claim. |
Replace competitor links | Skip pass drops every row not tagged Links to competitor. Default outreach type = competitor-link-replacement. |
Topical authority links to specific URL | No filter. Tier thresholds tighten (DR ≥ 50 for tier A). Default outreach type chosen per-row from Why This Prospect. |
Maximum link volume from any relevant site | No filter. Tier thresholds relax (DR ≥ 30 for tier A). Default outreach type chosen per-row. |
Custom | User-supplied paragraph; biases email tone and tier weights. No automatic skip filter. |
Target keyword(s) — one or more keywords the user wants their link to appear next to. The skill prospects the SERP for each. At least one required.
Brand name — the user's brand or product name. The Actor will not run without this (it is the brand input field).
Product/category description — one or two sentences describing what the user sells, who they sell to, and what category their product fits in. Example: "Apify — web scraping platform that runs serverless scrapers as APIs. We sell to developers and data teams who need scraped data without managing infrastructure." Required. Used in Step 6 (topical-fit gate) and Step 7 (adversarial-mention detection) to recognise prospects who are in the same product category — those won't link no matter the pitch. Without this, the skill cannot distinguish a genuine editorial opportunity from a competitor's blog.
URL of content to link to — the destination URL that will be inserted into partner articles. Required.
Competitors — anyone in the user's product category who would publish a "ours vs theirs" comparison page on their own site. Frame the ask this way explicitly: "List every company that would write an X-vs-YourBrand comparison page. These won't link to you no matter what — small competitors count too." Encourage 10+ entries; most users default to listing 3–5 obvious ones and miss the long tail. Mapped to competitorDomains on the Actor and reused in Steps 6 (adversarial-mention skip) and 7 (Links to competitor Why-tag).
After the user answers, offer (do not push) an Ahrefs auto-pull of organic competitors: "Want me to pull your top organic competitors from Ahrefs and add them to this list? Adds ~50 API units and surfaces smaller competitors you may have missed." If the user says yes and Ahrefs MCP is available, call mcp__claude_ai_Ahrefs__site-explorer-organic-competitors on the user's domain (extracted from input #5) and merge results into competitorDomains. If Ahrefs is unavailable or the user declines, proceed with the user-supplied list only.
Already-pitched domains (optional) — domains the user has already contacted in past campaigns. Accept a comma-separated list, a CSV/Sheet path, or "none". The skill drops these in the skip pass so the user doesn't double-pitch. Not required to proceed.
Number of organic results per keyword — how many Google organic SERP results to prospect per keyword. Default 10 if the user is unsure, but ask the question so the user knows the lever exists. Mapped to organicResult.
LLM sources to track — multi-select. Each enabled engine queries an additional AI search/chat surface and adds Google Search Scraper sub-Actor cost per result fetched. Default: all enabled. Mapping to Actor input flags:
| Option | Actor flag | Cost impact |
|---|---|---|
| ChatGPT Search | enableChatGpt | Per-result Google Search Scraper cost |
| Gemini | enableGemini | Per-result Google Search Scraper cost |
| Copilot (Microsoft / Bing) | enableCopilot | Per-result Google Search Scraper cost |
| Perplexity | enablePerplexity | Per-result Google Search Scraper cost |
| Google AI Mode | enableAiMode | Per-result Google Search Scraper cost |
| Google AI Overviews | enableAiOverviews | Free — parsed from the SERP already fetched. Keep on regardless of budget. |
Surface the multi-select to the user with all six pre-checked. Disabling individual engines is the main cost-cutting lever short of dropping organicResult — recommend keeping ChatGPT + Gemini on at minimum (they capture the largest share of LLM-driven discovery traffic in 2026).
Run email verification? — boolean. Default: yes. Mapped to enableEmailVerification on the Actor. When enabled, the Actor verifies every email returned by the Contact Details Scraper sub-Actor and tags each lead with a verification status (verified / catch-all / risky / invalid / unknown). The skill uses the status in Step 6 (invalid emails get auto-skipped) and surfaces it as the Email Verification column in the output. Disable only if the user is rate-limited on verification quota or running cost-tight smoke tests.
Once 1–6 and 8–10 are captured (7 is optional), move on.
Ask these next:
Brand info and voice — a short paragraph describing the product/brand and the tone for outreach (e.g., "casual and helpful", "formal B2B", "founder-led"). Used verbatim to shape every generated email.
Partnership type — the offer the user is willing to make. Determines the offer paragraph substituted into the per-row email. Outreach-type template selection happens separately, per-row, in Step 8.
| Option | What it offers |
|---|---|
| ABC link exchange | Three-way link swap: partner links to user, user links to a third party, third party links to partner. |
| Direct A B link exchange | Two-way link swap: partner links to user, user links to partner. |
| Resource page / list inclusion | Ask to be added to an existing curated list or roundup. No reciprocal link offered. |
| Unilateral ask (no reciprocal) | User asks for the link without offering anything in return — appropriate for unlinked-mention claims and broken-link replacements. |
| Other | User types their own offer (paid placement, free product, co-authored content, etc.). |
Output format:
| Format | Behavior |
|---|---|
xlsx | run_actor.js writes a styled spreadsheet to disk. |
markdown | Agent renders the table inline in chat with email drafts beneath each row. |
The Actor ID is apify/link-prospecting-tool. Full input schema lives in reference/apify-actor-usage.md.
Recommended call payload for this skill (defaults chosen for outreach-first workflow):
{
"queries": "<keyword 1>\n<keyword 2>",
"brand": "<user's brand name>",
"ownDomains": ["<user-domain.com>"],
"competitorDomains": [],
"ignoreDomains": [
"wikipedia.org", "github.com", "stackoverflow.com", "stackexchange.com",
"reddit.com", "quora.com", "youtube.com", "twitter.com", "x.com",
"linkedin.com", "facebook.com", "medium.com", "archive.org",
"chromewebstore.google.com", "addons.mozilla.org", "apps.apple.com",
"play.google.com", "microsoftedge.microsoft.com", "marketplace.visualstudio.com"
],
"organicResult": 10,
"maxContactsPerDomain": 3,
"department": ["marketing"],
"searchAuthorName": true,
"includeMention": true,
"enableChatGpt": true,
"enableGemini": true,
"enableCopilot": true,
"enablePerplexity": true,
"enableAiMode": true,
"enableAiOverviews": true,
"enableEmailVerification": true
}
The six enable* LLM-source flags map 1:1 to the user's Step 1 input #9 multi-select. Pass false for any engine the user deselected. enableEmailVerification maps to Step 1 input #10.
The ignoreDomains default includes two groups:
Do NOT auto-add to ignoreDomains (let the user decide):
kaggle.com, dev.to, substack.com, producthunt.com, g2.com, capterra.com, trustpilot.com — some users get real value pitching these.rapidapi.com, programmableweb.com, publicapis.dev — relevant for some products (especially developer-tool brands), irrelevant for others. Surface these as candidates only if the user wants to add them.The URL-pattern skip rules in Step 6 catch the per-row noise (subdomain prefixes, path patterns) that ignoreDomains can't express.
department defaults to ["marketing"] only. The skill prioritises editorial-leaning contacts within the returned marketing department during row composition (see Step 8). Only add sales if the user explicitly wants BD-style partnership pitches. Only add c_suite if the prospect domains are very small (1–5 person shops) where the founder may also be the editor.
Call the runner script:
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/scripts/run_actor.js \
--actor "apify/link-prospecting-tool" \
--input 'JSON_INPUT' \
--timeout 1800 \
--fetch-sub-datasets \
--output YYYY-MM-DD_outreach.json \
--format json
Notes:
--timeout 1800 is the recommended client-side wait. The Actor itself runs 15-50+ min depending on keyword count, LLM-engine fan-out, and enableEmailVerification. Past calibration runs land in the 20–55 min range. Bumping the default avoids the partial-result situation where the runner gives up but the Actor keeps going.RUNNING or READY when the runner exits), do not restart the Actor. Use scripts/fetch_run_artifacts.js --run-id <id> --output <file> to poll the existing run and download all artifacts — same output shape as run_actor.js --fetch-sub-datasets.--fetch-sub-datasets downloads sibling files alongside the main output: *_mentions.json, *_authors.json, *_serp.json, *_wcc.json. You need all of them to populate every output column.The Actor's output schema changed on or before 2026-05-20. The build_prospects script must handle the new shape; older skill versions that joined a separate MENTIONS dataset are broken.
Current schema (verified 2026-05-20):
| File written by runner / fetcher | Source | Populates |
|---|---|---|
*_output.json (main) | "All leads" dataset | Contact Full Name, Contact Job Title, Department, Seniority, Contact Email, Email Verification (when enableEmailVerification: true), Contact LinkedIn, Company, Domain. Each lead's source_url[] array contains the article URLs that produced this contact, each with a brand_mentioned_in_source boolean — this is the new home of the per-(URL, contact) mention data. |
*_serp.json | Google Search Results Scraper sub-Actor (one item per (query × engine) combination) | SERP Position, Article Title, Publish Date (via organicResults[]), and engine attribution per URL (Google Organic, ChatGPT, Gemini, Copilot, Perplexity, Google AI Mode) by joining aiModeResult.sources[], perplexitySearchResult.sources[], chatGptSearchResult.sources[], geminiSearchResult.sources[], copilotSearchResult.sources[]. URLs from ChatGPT carry a ?utm_source=chatgpt.com query suffix — normalise URLs (strip tracking params) before joining. |
*_wcc.json | Website Content Crawler sub-Actor | Placement Source Sentence, Placement With Link, Placement New Insertion, Article Author cross-check, outbound-link inspection for Links to competitor and Resource / roundup page tags. Canonical URL list for building rows — every URL that got body-crawled appears here, including ones that didn't yield a lead. |
*_authors.json | AI Web Scraper sub-Actor (when searchAuthorName: true) | Article Author, Author Source (set to searchAuthorName). Note: this sub-Actor frequently TIMES-OUT at its 300s default — partial results are still saved. |
What changed (vs. pre-2026-05-20 runs):
MENTIONS / AUTHORS / DOMAINS_WITH_LEADS named datasets — mention info is folded into main_leads[i].source_url[].SUB_ACTOR_RESULTS record in the parent run's key-value store. Sub-Actor runIds are now only discoverable from the parent run log via regex \[apify\.<slug> runId:([A-Za-z0-9]+)\]. The runner script's --fetch-sub-datasets flag now falls back to log-parsing when the KV index is missing; the standalone scripts/fetch_subactors_from_log.js does the same for runs whose runner already exited.source_url[i] carries only {domain, brand_mentioned_in_source, url} — no per-engine flags like the old ChatGPT_mention / Perplexity_mention. Engine attribution must be reconstructed from the SERP sub-dataset's LLM-result sub-fields (see SERP row above).If a column's source is missing, write "Not found" and add a manual-lookup hint in Notes. Never fabricate.
For every unique domain that survived the Actor's filtering, fetch authority and traffic metrics via Ahrefs MCP. Call all three tools in parallel per domain (and across domains — batch parallelise to keep this step under a minute for typical 20–50 prospect lists):
| Ahrefs tool | Used for | Column it populates |
|---|---|---|
mcp__claude_ai_Ahrefs__site-explorer-domain-rating (target = domain) | Domain Rating | Domain DR |
mcp__claude_ai_Ahrefs__site-explorer-metrics (target = article URL, mode = exact) | Page-level organic traffic (last 30 days) | Page Traffic |
mcp__claude_ai_Ahrefs__site-explorer-backlinks-stats (target = domain) | Referring domains count | Referring Domains |
If Ahrefs returns no data (domain not indexed, page too new), set the column to "-" and add a Notes hint "Ahrefs has no data — verify manually before pitching". Do not fabricate values.
Assign Prospect Tier using the thresholds matching the user's goal:
| Goal | Tier A | Tier B | Tier C |
|---|---|---|---|
Topical authority links to specific URL | DR ≥ 50 AND Page Traffic ≥ 300/mo | DR 30–49 OR Page Traffic 50–299 | everything below |
Maximum link volume from any relevant site | DR ≥ 30 AND Page Traffic ≥ 100/mo | DR 15–29 OR Page Traffic 20–99 | everything below |
Recover unlinked brand mentions | irrelevant — every mention is worth claiming; tier by DR alone (≥ 40 = A, 20–39 = B, < 20 = C) | ||
Replace competitor links | tier by DR (≥ 50 = A, 30–49 = B, < 30 = C) | ||
Custom | use the Topical authority thresholds |
Surface tier breakdown to the user before Step 8 — let them confirm whether to draft emails for all tiers or only A/B.
Before drafting any email, walk every row and apply skip rules. Skipped rows get Outreach Status = "Skip", a one-line reason in Notes, and no email or placement is generated (saves tokens and user review time).
Skip rules (in order):
Goal mismatch. If the goal is Recover unlinked brand mentions and the row's Mentions data shows brand_mentioned_in_source: false, skip. If the goal is Replace competitor links and the row's WCC body has no outbound link to any competitorDomains entry, skip.
Already pitched. If the row's domain matches an entry in the optional already-pitched list from Step 1 input #7, skip.
Own / competitor domain leak. The Actor should already filter these, but double-check — if the row's domain matches ownDomains or competitorDomains, skip.
Stale content. If Publish Date is older than 5 years, skip (low chance the editor will update the post).
URL-pattern skip. Skip rows whose URL matches any of these patterns:
developers.*, docs.*, support.*, helpcenter.*, legacy.*, dsarequests.*, connectivity.*, community.*, dev.* (when used as a doc subdomain — e.g. dev.example.com/api/), api.* only when followed by a path that's clearly documentation (/reference/, /docs/, /spec/). Do NOT skip rapidapi.com or other API-directory domains by this rule alone — api.* is a subdomain check, not a substring check./api-docs/, /reference/, /marketplace/, /extensions/, /profile/, /users/, /free-tools/, /spec/, /content/privacy, /content/terms, /content/dma, /content/how_we_work, /legal/, /_redirects, /sitemap.-scraper.php, -scraping.php, contains -data-scraper., -data-scraping., /bots/, /extension/, /detail/ (extension detail pages).Non-editorial page type. Inspect the WCC page body. Skip vendor product pages, pricing pages, login walls, sign-up pages, terms/legal pages, and pages with fewer than 400 words of body text. Word count <400 is the threshold — most editorial articles are 800+ words.
UGC slipped through. If the page URL contains /forum/, /thread/, /comments/, /answers/, /q/, /topic/, /discussion/, or the WCC body is structured as discussion replies, skip.
Category-fit gate (loose). Extract 4–6 category keywords from the user's product description (Step 1 input #4) — these describe the product category, not the specific subject of the user's URL. Examples for a web-scraping product: scrape, scraping, scraper, crawl, extract, data extraction. For a CMS product: cms, headless, content, editorial. The row's WCC body must contain at least 1 of these category keywords. If not, skip with reason Article isn't in user's product category (no '<kw>' match) — kills recipe blogs, finance articles, and other off-category content that slipped through SERP filtering.
For non-English campaigns, include both source-language and English keywords in the category set — many Czech/German/French articles cite English brand names and product categories inline. Example for a Czech water-filtration brand: {filtr, filtrace, vod, voda, filter, filtration, water}. A pure-Czech keyword set would miss articles by Czech authors who write in mixed CS/EN.
Known false negatives this rule can't catch (the per-row sub-agent in Step 8 must catch them):
/eshop/, embedded product cards with prices in body) and pre-skip.The category gate is intentionally loose. It is a category check, not a subject check — fine-grained "does this specific article fit my specific URL?" is delegated to the per-row sub-agent in Step 8. Example: for a user URL specifically about scraping a single travel site, a general "python web scraping" guide that never mentions that travel site passes this gate because it's in the user's category. The Step 8 sub-agent then decides whether to draft a placement (e.g., an additive line that names the specific travel site) or to recommend a content-based skip.
Surface the extracted category keyword list to the user at the start of Step 6 and let them add/remove before the pass runs.
Adversarial-mention detection. When brand_mentioned_in_source: true, scan ±100 characters around the brand mention in the WCC body for negative-context tokens: vs, versus, alternative to, alternatives to, compared to, compared with, instead of, better than, worse than, pros and cons, comparison, review of. If any of those appear within the window, skip with reason Adversarial mention (likely competitor comparison page) — won't link. This catches "ScrapeHero vs Apify" footer mentions, "alternatives to YourBrand" listicles, and similar non-link contexts. Critical — without this rule, the unlinked-mention-claim outreach type fires on dozens of false positives.
No contact AND no editorial path. If Contact Email = "Not found" AND no Article Author AND no domain-level contact page found in WCC outbound links, skip — there is no one to pitch.
Invalid email (from verification). When enableEmailVerification: true ran, inspect each row's Email Verification status. If the primary contact's status is invalid, try the alternate contacts first before skipping the row — past runs have lost Tier A candidates because the primary contact's email was invalid but a verified alternate existed on the same domain. Only skip with reason Email failed verification (invalid address) when no alternate has a verified or unchecked email. Statuses catch-all, risky, unknown are informational only (not auto-skipped) — surface them in the Email Verification column. Status verified ships as-is. If verification didn't run for this campaign, the column shows - and this rule is a no-op.
Never suggest external lookup services or workaround tools in Notes — no hunter.io, no LinkedIn search, no third-party verification services. The skill's job is to surface what we found, factually. When information is missing (no email, no author, etc.), state the gap and stop. The user knows where to look; suggesting their tools back at them is condescending and clutters the output.
A row failing any rule above is skipped before Step 7. Skipped rows still appear in the final output (so the user can see what was filtered) but with empty placement and email cells and Outreach Status = "Skip" plus the reason in Notes.
For every surviving row, compute one or two Why This Prospect tags, prioritised by which makes the strongest pitch. These tags drive the outreach-type template selection in Step 8.
| Tag | Trigger | Source of truth |
|---|---|---|
Mentions brand, no backlink | brand_mentioned_in_source: true AND backlink_in_source: false in Mentions dataset | Mentions dataset |
Links to competitor [domain] | WCC page body contains an outbound link whose host matches any competitorDomains entry | WCC dataset |
Top-3 SERP for [keyword] | SERP Position is 1, 2, or 3 for any keyword | Google Search Scraper sub-dataset |
Resource / roundup page | WCC page body has 10+ outbound links AND the page title or H1 matches /(best|top|list|roundup|tools|resources|guide to)/i | WCC dataset |
Outdated content | Publish Date is older than 24 months AND newer than 5 years (5+ years was already skipped) | Google Search Scraper / WCC |
A row may carry up to two tags. Order them by pitch strength using this priority: Mentions brand, no backlink > Links to competitor > Resource / roundup page > Top-3 SERP > Outdated content. If no tag fits, leave the column as "-" — the row still gets pitched, just without a special angle.
Each surviving row gets three placement artifacts plus one email draft. Apply these quality rules without exception:
No fabrication. If the article author or contact email is unknown, set the field to "Not found" and leave a one-line factual note (e.g., "No email found for this contact" or "No author detected"). Do not suggest external lookup tools or workarounds in Notes — see Step 6 rule 11 for the rationale. Just state the fact and stop.
Prioritise editorial-leaning contacts. When the All leads dataset returned multiple contacts for the same domain, prefer the one whose jobTitle matches /editor|content|writer|managing|editorial|blog|copy/i, demote anyone whose jobTitle matches /ceo|cfo|cto|founder|chief|vp\b|president/i unless the company is a 1–5 person shop. Surface the chosen contact in the row; keep alternates in Notes as "Alternate contacts: <name1> (<title>), <name2> (<title>)".
Three placement artifacts — try strategies in this priority order. Use the WCC sub-dataset's page text. Try strategies 1 → 2 → 3 in order; stop at the first one that produces a clean fit. Record which strategy was used by prepending the Notes field with Placement: drop-in / Placement: additive / Placement: new insertion.
Strategy 1 — drop-in (preferred). Find a sentence in the article where the user's URL can be added to existing words without changing any of the surrounding prose. The link goes on an existing word or short phrase the author already wrote. Output:
Placement Source Sentence = the verbatim sentence as it appears in the article.Placement With Link = the same sentence with the link inserted on an existing word/phrase. No new prose, no rewording, no deletions. Example: source = "Tools like Octoparse and BeautifulSoup work well for hotel data." → with-link = "Tools like Octoparse and **[BeautifulSoup](URL)** work well for hotel data." (link added to existing word). The editor doesn't have to approve any new wording — just a hyperlink.Placement New Insertion = "-".Drop-in works when the article already names a brand, tool, or technique that maps cleanly to the user's URL. It's the lowest-friction ask of any outreach pattern: "could you add a hyperlink to a word you already wrote?"
Strategy 2 — additive (second choice). When no drop-in target exists but the article has a sentence the user's URL would naturally follow, keep the original sentence intact and add one new sentence after it. The new sentence introduces an adjacent reader-need that the article doesn't already cover and that the user's URL addresses. Output:
Placement Source Sentence = the verbatim original sentence.Placement With Link = original sentence kept verbatim, followed by → and a one-sentence follow-on containing the link. Example: source = "By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data." → with-link = "By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data. → ...with hotel data. In need of competitor pricing data the API doesn't expose? Then you need a [hotel-data scraper](URL)." Keep the original sentence verbatim; the follow-on is the only new prose.Placement New Insertion = "-".The follow-on must (a) raise a reader need the existing sentence doesn't address, (b) connect that need to the user's URL, (c) be one sentence, ≤25 words, written in the article's voice.
Strategy 3 — new insertion (last resort). Only when neither drop-in nor additive works (e.g., no relevant sentence exists in the article body). Draft a fully new 1–2 sentence paragraph in the article's voice with a precise insertion location:
Placement Source Sentence = "-".Placement With Link = "-".Placement New Insertion = the drafted paragraph + the exact anchor ("insert as a new paragraph immediately after the sentence ending in '…X.' in the section under H2 'Y'.").If even a new insertion can't be drafted (the article is the wrong topic for the user's URL), set Outreach Status = "Skip" and add Notes: "No natural placement — article topic mismatch". However, the topical-fit gate in Step 6 rule 8 should have caught this case already; if a row makes it to Step 8 and can't get a placement, treat that as a hint that the gate needs more keywords.
Every surviving row goes through a sub-agent — not just Tier A/B/mention-only. The mechanical skip pass (Step 6) cuts the obviously bad prospects (competitor domains, doc subdomains, policy pages, dead-contact rows, off-category articles). Everything that passes is by definition a candidate worth real consideration, and the sub-agent makes the final fit call: read the article, attempt a placement (drop-in → additive → new insertion), and either draft email or return placement_strategy = "skip" with a content-specific reason. Python templates / regex / keyword scoring are not acceptable for the final draft — they produce mechanical splices that read awkward in context (we've seen this fail in practice on real campaigns).
Spawn sub-agents in parallel: one per surviving row, each given the WCC text, user URL context, contact info, brand voice, and partnership offer. The output schema (placement strategy, the three placement column values, email subject + body, skip recommendation, notes) is what gets merged back into the spreadsheet row.
A row that the sub-agent decides to skip after content review gets Outreach Status = "Skip" and the agent's reason in Notes — same shape as a Step 6 mechanical skip, just with a more nuanced rationale.
Determine Outreach Type per row from the Why This Prospect tags + user goal:
| Trigger | Outreach Type |
|---|---|
Tag Mentions brand, no backlink present | unlinked-mention-claim |
Tag Links to competitor present | competitor-link-replacement |
Tag Resource / roundup page present | resource-page-inclusion |
Tag Outdated content present | outdated-content-replacement |
None of the above OR only Top-3 SERP tag | topical-niche-edit |
Pull the matching template from reference/email-templates.md. The user's Step 2 Partnership type answer substitutes into the {{offer_paragraph}} placeholder inside the template — the outreach type determines structure and opening hook, the partnership type determines the offer.
Suggested Email Copy must use the user's brand voice. Apply the voice paragraph verbatim per the voice substitution rules in reference/email-templates.md. If voice input was skipped, use the generic-professional default and note this in Notes.
5a. The email MUST include the exact placement wording — verbatim. The recipient should never have to ask "what's the wording you're suggesting?" or click through to a separate cell to see the proposed text. Embed the proposal directly:
"In your line 'Tools like X and Y work well for Z', would you turn 'Y' into a hyperlink to <URL>?""Right after your sentence 'X happens because Y.', would you add: 'For the Z case specifically, see <URL>.'?""In the 'Honorable mentions' section, after 'each platform has its own trade-offs.', would you add this paragraph: '<full paragraph with link>'?"The email is the ask. If the wording isn't in the email, the ask is incomplete. Vague phrasing like "happy to draft it for you" / "happy to send exact wording" / "a follow-on sentence linking to..." is a content-skill bug — always rewrite to include the verbatim proposal.
Word cap: emails are 150 words or less (subject + body combined).
Personalisation is mandatory. Every email must open with a concrete reference to the specific article (title + a one-line takeaway from its content). No generic "I loved your article" openers.
Markdown format — agent renders inline in chat:
A: 8, B: 15, C: 7, Skipped: 12).xlsx format — scripts/write_xlsx.py --config campaign.json writes a 2-sheet workbook after Steps 5–8 finish.
xlsx is written as two sheets:
Outreach — active rows only, full 30 columns. This is the send-ready deliverable. Sorted by Prospect Tier ascending (A first), then by Domain DR descending, then by SERP Position ascending.Skipped — skipped rows with reduced columns: Domain, Article URL, Article Title, Skip Reason (extracted from Notes), Source Engines, Why This Prospect. This sheet exists for auditing what was filtered without cluttering the main view. Missing Ahrefs columns aren't visible here, so the empty-data confusion goes away.The user opens the file and lands on Outreach by default — only actionable prospects. They can switch to Skipped to audit. This pattern replaces the older single-sheet-with-red-rows approach.
Both formats also produce a sidecar run_metadata.json: { runId, actorId, startedAt, finishedAt, inputs, datasetIds, tierCounts, skipCounts }. Drop it next to the main output file.
SERP Position, Source Engines, Keyword, Article Title, Article URL, Domain, Domain DR, Page Traffic, Referring Domains, Prospect Tier, Why This Prospect, Article Author, Author Source, Publish Date, Contact Full Name, Contact Job Title, Department, Seniority, Contact Email, Email Verification (one of verified / catch-all / risky / invalid / unknown / -), Contact LinkedIn, Company, Outreach Type, Partnership Offer, Placement Source Sentence, Placement With Link, Placement New Insertion, Suggested Email Copy, Outreach Status (default "Not started", "Skip" for skipped rows), Notes (auto-flags + skip reason + manual hints + alternate contacts).
Full schema with types and source datasets per column is in reference/output-formats.md.
| Error / symptom | What to do |
|---|---|
APIFY_TOKEN not found | Ask user to create .env with APIFY_TOKEN=your_token. Get one at console.apify.com/account/integrations. |
| Ahrefs MCP unavailable | Skip Step 5. Set Domain DR, Page Traffic, Referring Domains, Prospect Tier to "-" and add a one-line note in the output header explaining tiers were not computed. Continue with the rest of the workflow. |
Cannot find module 'xlsx' | Run npm install inside the skill's scripts/ folder. |
Error: 'brand' is required | The user skipped Step 1 anchor #3. Re-ask brand name. |
Actor run TIMED-OUT (client-side, Actor still running on Apify) | Do not restart. Use node --env-file=.env scripts/fetch_run_artifacts.js --run-id <runId> --output YYYY-MM-DD_outreach.json --timeout 1800 to poll the existing run and download all datasets when it terminates. Same output shape as the runner. |
Actor run TIMED-OUT (Actor itself ran past its timeoutSecs) | See reference/troubleshooting.md. Lower organicResult, cut keywords, or disable some LLM engines. Almost never the cause — usually it's the client-side wait. |
Author = Not found | Expected for ~30% of pages without bylines. Skill writes "Not found" and adds a manual-lookup hint to Notes. Do not fabricate. The AI Web Scraper sub-Actor frequently TIMES-OUT at its 300s default mid-crawl (past campaigns have lost author data for ~7 high-DR sites at a time this way); when this happens the _authors.json dataset still contains the partial results that finished before the timeout. WCC metadata.author / openGraph article:author / JSON-LD Person.name are the fallbacks the agent should try before writing "Not found". |
Sub-Actor datasets missing (no _mentions.json / _authors.json / _serp.json / _wcc.json after --fetch-sub-datasets) | Actor's output schema changed on or before 2026-05-20: SUB_ACTOR_RESULTS is no longer in the KV store, and MENTIONS/AUTHORS/DOMAINS_WITH_LEADS named datasets are gone. The runner script now falls back to log-parsing automatically — but if it didn't, run node --env-file=.env scripts/fetch_subactors_from_log.js --run-id <id> --base <prefix> to populate _serp.json, _wcc.json, _authors.json from the sub-Actor runIds visible in the parent log. Mention data is now embedded in main_leads[i].source_url[], not a separate file. |
Contact Email = Not found | Contact Details Scraper sub-Actor missed the site. Set the field to "Not found" and leave a one-line factual Notes entry ("No email found for this contact"). Do not fabricate, do not suggest external tools. If both contact and author are missing, the skip pass (Step 6, rule 7) will drop the row. |
| All rows skipped by goal filter | The user's goal is too narrow for the SERP results. Suggest broadening the goal (e.g., Topical authority instead of Recover unlinked brand mentions) or expanding keywords. |
0 leads returned | Keyword too narrow, or ownDomains / competitorDomains filtered out all SERP results. Broaden keyword, narrow exclusions, raise organicResult. |
| Costs higher than expected | Sub-Actor fan-out (Google Search Scraper, WCC, Contact Details Scraper, AI Web Scraper) stacks. See cost section in reference/apify-actor-usage.md. To shrink: drop searchAuthorName, disable AI platforms (enableChatGpt: false, etc.), lower maxContactsPerDomain to 1, lower organicResult to 5. |