Help us improve
Share bugs, ideas, or general feedback.
From rampstack-skills
Designs and audits programmatic SEO programs that produce durable traffic. Covers data sources, template design, quality control, internal linking, crawl budget, and deciding whether pSEO fits.
npx claudepluginhub rampstackco/claude-skills --plugin rampstack-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/rampstack-skills:programmatic-seoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A senior SEO strategist's playbook for designing and running programmatic SEO programs that produce durable traffic instead of penalty-bait.
references/aeo-geo-for-programmatic-pages.mdreferences/common-pseo-failures.mdreferences/crawl-budget-management.mdreferences/data-source-identification-patterns.mdreferences/internal-linking-at-scale.mdreferences/quality-control-at-scale.mdreferences/refresh-at-scale.mdreferences/schema-design-patterns.mdreferences/template-design-patterns.mdreferences/when-pseo-works-decision.mdDesigns and evaluates programmatic SEO strategies for scaling SEO-driven pages using templates and structured data. Calculates feasibility index to avoid thin content and index bloat.
Plans and audits programmatic SEO for data-generated pages at scale, covering template engines, URL patterns, internal linking, thin content safeguards, and index bloat prevention.
Plans and audits programmatic SEO pages from structured data sources like databases, APIs, CSV/JSON. Designs templates, URL patterns, internal links, sitemaps, and thin content safeguards.
Share bugs, ideas, or general feedback.
A senior SEO strategist's playbook for designing and running programmatic SEO programs that produce durable traffic instead of penalty-bait.
Programmatic SEO has a complicated reputation. Sites like Zillow, Airbnb, TripAdvisor, Indeed, and Yelp have built billion-dollar traffic engines on pSEO. Other sites have built pSEO programs that got hit by Google's helpful-content updates and lost 80% of their traffic in a week. The difference is rarely the technique; it is the underlying data quality and the quality control discipline at scale.
This skill is the playbook for getting that distinction right. It assumes you have decided what keyword space to target (see seo-keyword) and how the broader content program is shaped (see content-strategy). It does not write individual editorial pieces (see content-and-copy for that) and does not architect editorial topic hubs (see pillar-content-architecture for that). What it does is teach the discipline of generating high-volume pages programmatically from structured data sources without producing thin content, duplicate content, or scale-without-substance pages that get penalized.
When to use this skill: deciding whether pSEO is a fit for the program at all (the most important question), designing a new pSEO system, auditing an existing pSEO set that is not ranking or has been hit by an algorithm update, or building quality-control discipline for a pSEO program that grew faster than its quality processes.
This skill spans scaled-content programs from data source through quality control. It composes with five sister and adjacent skills, and the distinction between them is what keeps each one sharp.
content-strategy is program-scope: editorial pillars, calendar, governance. Decides whether pSEO fits the program at all.pillar-content-architecture is hub-scope: one topic with 10 to 15 intentional editorial pieces. Editorial in nature.content-brief-authoring is per-piece scope: brief for one editorial artifact.content-and-copy is execution scope: writing individual editorial pieces.The clean reading order: content-strategy decides whether pSEO is a fit, seo-keyword surfaces the long-tail keyword space, this skill designs the pSEO system, editorial-qa (forthcoming) provides the QA discipline for sampled quality control across the set. content-and-copy and content-brief-authoring are not in the loop for pSEO at scale; those skills are for editorial pieces, not data-driven generated pages.
The audience: SEO content strategists, content engineers, agencies running pSEO programs, in-house teams considering pSEO as a growth lever. The voice is senior SEO strategist to junior PM or marketer. Specific, opinionated, honest. The reputation problem is not pSEO; it is pSEO without underlying value.
The keystone question. pSEO works when all of the following are true.
1. Real underlying data. A genuine structured data source with depth: 10+ fields per record, ideally 20+, with first-party data, expert-curated data, or licensed datasets. Not just scraped data or AI-generated facts dressed as structured records.
2. Long-tail query volume justifies the effort. The queries the program would target have meaningful aggregate volume even if individual queries are small. Real estate "homes for sale in {neighborhood}" works at scale. "Blue widget reviews 2026" generated for every adjective times widget combination does not, because the queries are not actually searched.
3. User intent is queryable. The user's question can be answered through structured data presented well. Not through narrative explanation, judgment, or analysis the data cannot supply.
4. Update cadence aligns with query volatility. Real estate listings update daily and the data refresh aligns. "Best [thing] for [year]" pages get stale annually and need a refresh discipline budgeted in.
5. Quality control is operationally feasible. The team has the capacity to sample-audit the set, fix failures, and maintain quality as the set grows. Not aspirationally; budgeted in headcount.
pSEO does NOT work when:
The honest framing. Most teams that ask "should we do pSEO?" should hear "probably not, unless the underlying data is unique or first-party expertise makes the pages actually useful." The reputation problem is not the technique; it is the technique applied without underlying value.
Detail in references/when-pseo-works-decision.md.
The data source is the pSEO program. Common sources with different defensibility profiles.
First-party data. Customer transactions, content database, user-generated content. Defensible because nobody else has it. Examples: Glassdoor's employee reviews, Yelp's user ratings, TripAdvisor's traveler reviews.
Licensed datasets. Industry databases, regulatory data, government datasets, licensed third-party feeds. Defensible by license terms and integration depth. Examples: Zillow's MLS partnerships, real estate brokerage feeds, sports statistics licenses.
Aggregated public data. Scraped, cleaned, enriched. Judgment call on legality (often gray area depending on robots.txt, terms of service, jurisdictional rules). Defensibility depends on the cleaning and enrichment work. Easy to copy if the cleaning is shallow.
Expert-curated content. The dataset is built by hiring experts to populate it. Slow, high-quality, defensible. Examples: Wirecutter's product testing, expert-reviewed medical content, curated editorial databases.
Synthesized data. Combining multiple sources into a unique view. "Neighborhoods times schools times prices" combining three datasets into a comparison view. Defensibility comes from the synthesis logic and the ongoing maintenance of multi-source pipelines.
The "moat" question. Would a competitor be able to replicate the data source? If yes, the pSEO program has no defensibility; anyone can copy. If no, the data source becomes a moat that compounds. Zillow's MLS partnerships, Glassdoor's employee reviews, Crunchbase's funding data, are moats. "Scraped Wikipedia plus AI rewrite" is not.
Detail in references/data-source-identification-patterns.md.
The template is the structure that data fills. Design principles.
Above-the-fold answer. The user's specific question answered in the first 200 words of the rendered page, structured for both human reading and AI extraction. The answer is what gets cited; the rest is supporting depth.
Variable density. The template accommodates records with sparse fields (some have 5 data points, others have 50) without looking broken. Sparse pages need fallback patterns; dense pages need progressive disclosure to avoid overwhelming.
Heading hierarchy that reflects data. H2s and H3s map to data sections (overview, details, comparisons, related). The heading structure is not decorative; it tells crawlers and AI engines what the page covers.
Schema markup as part of template. Structured data (JSON-LD, microdata, or RDFa depending on the stack) embedded in the template renders machine-readable signals at scale across the entire set. Without schema, the set is invisible to the structured-data extraction layer search and answer engines run.
Internal linking placeholders. The template includes link slots for related-record cross-references, parent-category links, and sibling-record links. 5 to 15 internal links per page is typical for a well-linked set.
Distinctive value per page. Each generated page must offer something the user could not get by going up to the parent or sideways to a sibling. If the page is just a re-pivot of the parent's data, the page is filler.
The template's quality bar. A randomly sampled page from the set, viewed in isolation, should answer the user's likely query competently. If a randomly sampled page is thin, the entire set is thin.
Detail in references/template-design-patterns.md.
The data shape that drives the template. Design principles.
Field count signals depth. 5 fields per record is thin. 15 to 20 is competent. 30+ is deep. The field count is not a vanity metric; it determines what the template can render at depth versus what falls back to filler.
Required vs optional fields. Which fields must populate for a page to ship; templates need graceful degradation for optional gaps. A page with 3 of 30 optional fields populated should not ship; a page with 25 of 30 should.
Computed fields. Derived from source data. Average price per square foot computed from listings adds depth without requiring more source data. Computed fields are the pSEO equivalent of the writer adding analysis: the data does the analysis once, every page benefits.
Cross-record fields. Fields that reference other records in the set enable internal linking and comparison. The "5 most similar neighborhoods" field on every neighborhood page powers the sibling-link section without manual curation.
Update frequency tags. Which fields are static (geographic features, founding date) versus which need refresh tracking (current pricing, availability, current statistics). Without tagging, refresh becomes "audit everything" instead of "refresh the volatile fields."
The "schema-as-product" principle. The schema that drives pSEO IS the product surface. Treat it with the same rigor as a database schema for a customer-facing application. Versioned, reviewed, documented, breaking-change-aware.
Detail in references/schema-design-patterns.md.
Auditing all 100,000 pages individually is infeasible. The discipline.
Sampling strategy. Random sample 50 to 200 pages per audit cycle, balanced across data shape (sparse, dense, recent, old, popular categories, niche categories). The sample is not "the latest 50 pages"; it is a stratified sample that exposes problems specific to particular data shapes.
Automated checks. Heading structure, schema validity, internal link count, word count thresholds, duplicate-content checks, broken-link checks. Run on every page on a cadence; surface failures as a queue.
Manual review checklist. For sampled pages, check the top-200-word answer quality, check whether the page would satisfy the user's query, check whether the page reads as distinctive versus templated. The manual review is the layer automated checks miss.
Failure thresholds. If more than 5% of sampled pages fail the manual review, halt new generation and fix the template or data before scaling further. The threshold is not negotiable; without it, quality drift compounds invisibly until the algorithm update reveals it.
Cohort tracking. Pages generated in one period rank differently from pages generated in another? That signals a template change or data quality drift; investigate. Cohort tracking is the pSEO equivalent of the experimentation team's drift detection.
The team budget question. A 10,000-page pSEO set requires roughly 0.5 to 1.0 FTE of ongoing quality control discipline. If that is not budgeted, the program will decay. The set will look fine on launch and degrade over 6 to 12 months as data drifts, templates ship without QC, and the gap between the QC plan and the actual cadence widens.
Detail in references/quality-control-at-scale.md.
pSEO sets need intentional internal linking architecture to compound.
Hub pages. Parent-level pages (e.g., "homes for sale in Denver") that link to all child pages (specific neighborhoods). Hub pages are linked from main navigation; they are the entry point for the set's traffic.
Sibling linking. Each page links to 5 to 15 related sibling pages: similar neighborhoods, similar price ranges, similar features. Sibling links are computed from cross-record fields in the schema; they are not manual curation at scale.
Reverse-direction linking. Child pages link up to hub; sibling links go both directions. The graph is bidirectional; PageRank flows both ways.
Anchor text variation. Avoid every page using the same anchor text. Vary by record characteristic (the record name, a descriptive phrase, the parent category). Uniform anchor text at scale signals manipulation to ranking systems.
Crawl-friendly architecture. Hub pages are linked from main navigation. Child pages are reachable via hub or via segmented sitemaps. No orphan child pages.
The PageRank flow principle. Internal linking is what makes pSEO sets compound. A well-linked set distributes ranking signal across all pages; a poorly-linked set has hub pages ranking and child pages orphaned. The link architecture is half the program; the data is the other half.
Detail in references/internal-linking-at-scale.md.
Search engines have crawl budgets. Large pSEO sets can exhaust them.
Sitemap segmentation. Split sitemaps by category or recency so crawlers prioritize active pages. A single sitemap with 100,000 entries is harder for crawlers to process than 10 segmented sitemaps with 10,000 entries each, organized by category or last-update.
Noindex on thin pages. Pages with insufficient data should noindex rather than ship as bait. The discipline: pages that fail the schema's required-field threshold do not get a public URL; they sit in the system as drafts until their data populates.
Canonical handling. Comparison pages (X vs Y) often have inverse pages (Y vs X). Canonicalize to one. Comparison-pivot duplicates without canonicalization split ranking signal.
Crawl rate signals. Monitor Google Search Console crawl stats. If crawl rate is plateauing while page count grows, the set is hitting budget limits. The fix is not "publish more pages"; it is "noindex thin pages and improve internal linking so crawlers prioritize better."
Pruning criteria. Pages that fail to attract clicks or impressions for 6+ months should noindex or 410. Pruning is hygiene, not failure. A 100,000-page set with 60% earning traffic is healthier than the same set with 40% earning traffic plus 60% dead weight crawlers waste budget on.
Detail in references/crawl-budget-management.md.
Answer engines treat programmatic pages differently from editorial pages.
Direct-answer extraction. AI engines extract the top-200-word answer; templates that lead with the answer get cited. The opening of the page is the citation candidate; everything after it is supporting depth.
Structured data signals. JSON-LD with comprehensive schema is read by AI engines as authority signal at scale. Programmatic pages with minimal schema lose to programmatic pages with deep schema even when the underlying data is similar.
Citation patterns. AI engines tend to cite programmatic data pages for factual queries (prices, statistics, comparisons, factual details) and editorial pages for analytical queries (why, how, what should I think). Design templates to fit the factual-query lane; do not try to make programmatic pages compete for analytical queries that editorial content owns.
Quality crackdown sensitivity. AI engines have their own quality signals beyond search engines. Thin programmatic content gets deprioritized faster in AI surfaces than in traditional search. The pSEO penalty risk has migrated to AI surfaces; the same quality-at-scale discipline applies, with less tolerance.
The "two-engine optimization" framing applies. pSEO pages should serve both search engines and answer engines. Most pSEO programs were designed pre-AEO and need template updates for the new surface: stronger top-200-word answers, deeper schema markup, FAQPage schema where the page contains FAQ structure.
Detail in references/aeo-geo-for-programmatic-pages.md.
pSEO sets decay if not maintained.
Data refresh cadence. Quarterly minimum for most datasets. Daily or weekly for time-sensitive data (real estate listings, prices, availability, current statistics). The refresh cadence is set when the program launches; retrofitting is expensive.
Template version migration. When the template improves, all existing pages need migration. Cohort-by-cohort migration with monitoring beats big-bang migration. Each cohort gets the new template, sits for 30 days under monitoring, then the next cohort migrates.
Pruning lifecycle. Pages generated 2+ years ago that have not earned traffic should be evaluated for noindex or removal. The 24-month checkpoint is the standard cutoff; some categories warrant earlier or later pruning.
Set-level refresh. Occasionally the entire set's template needs a refresh as user expectations evolve, AI engine signals shift, or category conventions change. Set-level refreshes are 6-month projects, not weekend cleanups.
Detail in references/refresh-at-scale.md.
Rapid-fire. Diagnoses in references/common-pseo-failures.md.
When designing or auditing a programmatic SEO program, walk these 12 considerations.
The output of the framework is a pSEO design document the team can reference at every stage: data source named, schema versioned, template specified, QC discipline budgeted, internal linking architecture mapped, refresh cadence set.
references/when-pseo-works-decision.md - Five-criterion decision framework with worked examples of yes, no, and maybe.references/data-source-identification-patterns.md - First-party, licensed, public, expert-curated, synthesized; defensibility analysis.references/template-design-patterns.md - Variable density, above-the-fold answer, heading hierarchy, schema integration, link slots.references/schema-design-patterns.md - Field count, computed fields, cross-record fields, update tags, schema-as-product.references/quality-control-at-scale.md - Sampling strategy, automated checks, manual review checklist, failure thresholds, cohort tracking.references/internal-linking-at-scale.md - Hub-and-spoke, sibling linking, anchor text variation, orphan prevention.references/crawl-budget-management.md - Sitemap segmentation, noindex patterns, canonicalization, crawl rate monitoring, pruning criteria.references/aeo-geo-for-programmatic-pages.md - Direct-answer extraction, structured data signals, citation patterns, two-engine optimization.references/refresh-at-scale.md - Data refresh cadence, template versioning, cohort migration, pruning lifecycle.references/common-pseo-failures.md - Eleven-plus failure patterns with diagnoses and fixes.The only durable pSEO programs are the ones that hold quality at scale. The temptation to scale quantity at the cost of quality has produced the entire bad reputation pSEO carries. Sites that resist that temptation, that maintain real data depth, real quality control, real refresh discipline, keep producing meaningful traffic for years. Sites that do not get penalized within months of an algorithm update.
The discipline is not optional and it is not free. Budget for it before generating the first page, or do not start at all.
When in doubt about whether a pSEO program is ready, ask: is the data source actually unique, is the schema deep, does the template lead with the answer, is the internal linking architecture mapped, is QC budgeted in headcount, is the refresh cadence set? If yes to all of those, ship the design document and let the engineering work begin. If no to any of them, fix the gap before any pages generate.