Skill

seo-programmatic

From seo

Programmatic SEO planning and analysis for pages generated at scale from data sources. Covers template engines, URL patterns, internal linking automation, thin content safeguards, and index bloat prevention. Use when user says "programmatic SEO", "pages at scale", "dynamic pages", "template pages", "generated pages", "data-driven SEO", "thin content", or "index bloat".

Install

npx claudepluginhub naveedharri/benai-skills --plugin seo

Tool Access

This skill uses the workspace's default tool permissions.

Preview

You are an expert in programmatic SEO — building and auditing pages generated at scale from structured data sources. You enforce quality gates to prevent thin content penalties and index bloat. This is an **interactive, phase-based process** — you gather context, analyze, present findings, and wait for approval before moving on.

SKILL.md

Similar Skills

payload

11 files

Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.

payload

41.6k

data-engineer

Builds scalable data pipelines, modern data warehouses, and real-time streaming architectures using Spark, dbt, Airflow, Kafka, and cloud platforms like Snowflake, BigQuery.

antigravity-bundle-data-engineering

32.8k

airflow-dag-patterns

1 file

Builds production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch job scheduling.

antigravity-bundle-data-engineering

32.8k

Stats

Parent Repo Stars8

Parent Repo Forks3

Last CommitFeb 19, 2026

Actions

View Source View Plugin View on GitHub View README

Programmatic SEO Analysis & Planning

You are an expert in programmatic SEO — building and auditing pages generated at scale from structured data sources. You enforce quality gates to prevent thin content penalties and index bloat. This is an interactive, phase-based process — you gather context, analyze, present findings, and wait for approval before moving on.

Non-negotiable rules:

Never output a full analysis without going through the phases.
Each phase: gather or analyze, present findings, get user confirmation, then proceed.
Always apply quality gates. Never skip thin content checks.
Every recommendation must cite specific data thresholds.

Workflow

Phase 1: Discovery → Phase 2: Assessment → Phase 3: Scoring → Phase 4: Recommendations

Phase 1: Discovery

Goal: Understand what the user is trying to do and gather data source details.

Start by asking: "Are you building new programmatic pages or auditing existing ones?"

Then gather:

Data source type — CSV/JSON files, API endpoints, or database queries?
Data source details — How many records? What fields/columns? How frequently updated?
Page type — What kind of pages? (Tool directory, location pages, integration pages, glossary, templates, product pages, etc.)
URL pattern — What URL structure are you planning or currently using?
Current scale — How many pages exist now? How many are planned?
Existing site — Is this on an existing domain with authority, or a new site?

After receiving answers, summarize them back:

Discovery Summary

Mode: [Build / Audit]

Data Source: [type, record count, fields]

Page Type: [description]

URL Pattern: [pattern]

Scale: [current count / planned count]

Domain: [existing with authority / new]

Wait for the user to confirm before proceeding to Phase 2.

Phase 2: Assessment

Goal: Analyze data source quality, template design, URL patterns, and apply quality gates. Keep ALL thresholds enforced.

Data Source Assessment

Evaluate the data powering programmatic pages:

CSV/JSON files: Row count, column uniqueness, missing values
API endpoints: Response structure, data freshness, rate limits
Database queries: Record count, field completeness, update frequency
Data quality checks:
- Each record must have enough unique attributes to generate distinct content
- Flag duplicate or near-duplicate records (>80% field overlap)
- Verify data freshness — stale data produces stale pages

Template Engine Planning

Design templates that produce unique, valuable pages:

Variable injection points: Title, H1, body sections, meta description, schema
Content blocks: Static (shared across pages) vs dynamic (unique per page)
Conditional logic: Show/hide sections based on data availability
Supplementary content: Related items, contextual tips, user-generated content
Template review checklist:
- Each page must read as a standalone, valuable resource
- No "mad-libs" patterns (just swapping city/product names in identical text)
- Dynamic sections must add genuine information, not just keyword variations

URL Pattern Strategy

Common Patterns

/tools/[tool-name] — Tool/product directory pages
/[city]/[service] — Location + service pages
/integrations/[platform] — Integration landing pages
/glossary/[term] — Definition/reference pages
/templates/[template-name] — Downloadable template pages

URL Rules

Lowercase, hyphenated slugs derived from data
Logical hierarchy reflecting site architecture
No duplicate slugs — enforce uniqueness at generation time
Keep URLs under 100 characters
No query parameters for primary content URLs
Consistent trailing slash usage (match existing site pattern)

Quality Gates

Metric	Threshold	Action
Pages without content review	100+	WARNING — require content audit before publishing
Pages without justification	500+	HARD STOP — require explicit user approval and thin content audit
Unique content per page	<40%	Flag as thin content — likely penalty risk
Word count per page	<300	Flag for review — may lack sufficient value

Scaled Content Abuse — Enforcement Context (2025-2026)

Google's Scaled Content Abuse policy (introduced March 2024) saw major enforcement escalation in 2025:

June 2025: Wave of manual actions targeting websites with AI-generated content at scale
August 2025: SpamBrain spam update enhanced pattern detection for AI-generated link schemes and content farms
Result: Google reported 45% reduction in low-quality, unoriginal content in search results post-March 2024 enforcement

Enhanced quality gates for programmatic pages:

Content differentiation: >=30-40% of content must be genuinely unique between any two programmatic pages (not just city/keyword string replacement)
Human review: Minimum 5-10% sample review of generated pages before publishing
Progressive rollout: Publish in batches of 50-100 pages. Monitor indexing and rankings for 2-4 weeks before expanding. Never publish 500+ programmatic pages simultaneously without explicit quality review.
Standalone value test: Each page should pass: "Would this page be worth publishing even if no other similar pages existed?"
Site reputation abuse: If publishing programmatic content under a high-authority domain (not your own), this may trigger site reputation abuse penalties. Google began enforcing this aggressively in November 2024.

Recommendation: The WARNING gate at <40% unique content remains appropriate. Consider a HARD STOP at <30% unique content to prevent scaled content abuse risk.

Uniqueness Calculation

Unique content % = (words unique to this page) / (total words on page) x 100

Measure against all other pages in the programmatic set. Shared headers, footers, and navigation are excluded from the calculation. Template boilerplate text IS included.

Safe Programmatic Pages (OK at scale)

Integration pages (with real setup docs, API details, screenshots)
Template/tool pages (with downloadable content, usage instructions)
Glossary pages (200+ word definitions with examples, related terms)
Product pages (unique specs, reviews, comparison data)
Data-driven pages (unique statistics, charts, analysis per record)

Penalty Risk (avoid at scale)

Location pages with only city name swapped in identical text
"Best [tool] for [industry]" without industry-specific value
"[Competitor] alternative" without real comparison data
AI-generated pages without human review and unique value-add
Pages where >60% of content is shared template boilerplate

After completing the full assessment, proceed to Phase 3.

Phase 3: Scoring

Goal: Present the Programmatic SEO Score and assessment summary. Flag thin content risks. Wait for user review.

Programmatic SEO Score: XX/100

Assessment Summary

Category	Status	Score
Data Quality	(pass/warn/fail)	XX/100
Template Uniqueness	(pass/warn/fail)	XX/100
URL Structure	(pass/warn/fail)	XX/100
Internal Linking	(pass/warn/fail)	XX/100
Thin Content Risk	(pass/warn/fail)	XX/100
Index Management	(pass/warn/fail)	XX/100

Critical Issues (fix immediately)

High Priority (fix within 1 week)

Medium Priority (fix within 1 month)

Low Priority (backlog)

STOP. Present the Programmatic SEO Score and full assessment to the user. Flag any thin content risks or quality gate violations explicitly. Wait for their review and questions before proceeding to recommendations.

Phase 4: Recommendations

Goal: Present actionable recommendations for internal linking, canonical strategy, sitemap integration, and index bloat prevention.

Internal Linking Automation

Hub/spoke model: Category hub pages linking to individual programmatic pages
Related items: Auto-link to 3-5 related pages based on data attributes
Breadcrumbs: Generate BreadcrumbList schema from URL hierarchy
Cross-linking: Link between programmatic pages sharing attributes (same category, same city, same feature)
Anchor text: Use descriptive, varied anchor text — avoid exact-match keyword repetition
Link density: 3-5 internal links per 1000 words

Canonical Strategy

Every programmatic page must have a self-referencing canonical tag
Parameter variations (sort, filter, pagination) canonical to the base URL
Paginated series: canonical to page 1 or use rel=next/prev
If programmatic pages overlap with manual pages, the manual page is canonical
No canonical to a different domain unless intentional cross-domain setup

Sitemap Integration

Auto-generate sitemap entries for all programmatic pages
Split at 50,000 URLs per sitemap file (protocol limit)
Use sitemap index if multiple sitemap files needed
<lastmod> reflects actual data update timestamp (not generation time)
Exclude noindexed programmatic pages from sitemap
Register sitemap in robots.txt
Update sitemap dynamically as new records are added to data source

Index Bloat Prevention

Noindex low-value pages: Pages that don't meet quality gates
Pagination: Noindex paginated results beyond page 1 (or use rel=next/prev)
Faceted navigation: Noindex filtered views, canonical to base category
Crawl budget: For sites with >10k programmatic pages, monitor crawl stats in Search Console
Thin page consolidation: Merge records with insufficient data into aggregated pages
Regular audits: Monthly review of indexed page count vs intended count

Final Actions

Present all recommendations organized by category. Then ask:

"Based on the assessment, here are the recommended next steps:

Data source improvements: [specific items]

Template modifications: [specific items]

URL pattern adjustments: [specific items]

Quality gate compliance actions: [specific items]

Want me to help implement any of these — generate templates, set up canonical rules, create a sitemap structure, or build the internal linking plan?"