Help us improve
Share bugs, ideas, or general feedback.
From zyte-web-data
Generates web-poet page object code by synthesizing per-page extraction analyses into a domain-wide page object class. Used for building robust web scrapers in Python.
npx claudepluginhub zytedata/claude-skills --plugin zyte-web-dataHow this skill is triggered — by the user, by Claude, or both
Slash command
/zyte-web-data:scrape-codegen-generate [work-path] [output-path] [spec-path][work-path] [output-path] [spec-path]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are generating web-poet page object code. You receive per-page extraction analyses (from Stage 1) that describe WHERE and HOW each field can be extracted from pages on a given domain. Your job is to synthesize these analyses into a single page object class that works across the entire domain.
Mines projects and conversations into a searchable memory palace and retrieves past work via semantic search.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
Share bugs, ideas, or general feedback.
You are generating web-poet page object code. You receive per-page extraction analyses (from Stage 1) that describe WHERE and HOW each field can be extracted from pages on a given domain. Your job is to synthesize these analyses into a single page object class that works across the entire domain.
The raw argument string is $ARGUMENTS. Split it into 3 whitespace-separated positional arguments:
.scrape/.work/spec.scrape/spec/page_object.py.scrape/spec/spec.jsonPlus, taken from the surrounding prompt text (not from the argument string):
@field methods for those fields. When not set, generate all fields found in the analyses.Read the web-poet API reference:
../scrape-codegen/references/web-poet-reference.md
Read the schema from {spec_path} — use the properties object inside schema.
Read all Stage 1 analysis files from {work_path}/codegen-analyze/:
{work_path}/codegen-analyze/detail-1.json
{work_path}/codegen-analyze/detail-2.json
...
For each field in the schema, review all per-page analyses together:
Generate a complete, self-contained Python module following the web-poet reference. The code must:
None when a field is not present (never empty string or []).extruct for JSON-LD/microdata, price_parser for prices, jmespath for JSON queries.Structure:
from web_poet import WebPage, field
# ... other imports as needed
class PageObject(WebPage[dict]):
# shared helpers as @cached_property if multiple fields need them
@field
def field_name(self) -> type | None:
# extraction logic
...
Save the generated code to {output_path}.
Return a summary of what was generated:
Generated page object with N fields:
name: CSS h1.product-title::text
price: JSON-LD offers.price, fallback to CSS span.price
description: CSS div.product-description (text join)
rating: JSON-LD aggregateRating.ratingValue
image_url: CSS img.product-image::attr(src) + urljoin
Include notes on any fields where consensus was difficult or where extraction may be fragile.