From data-liberation
Extracts content from closed platforms (GoDaddy, Hostinger, HubSpot, Shopify, Squarespace, Webflow, Wix) into WordPress WXR files. Exports products to WooCommerce CSV; supports discovery, verification, resume, and import.
npx claudepluginhub automattic/data-liberation-agent --plugin data-liberationThis skill uses the workspace's default tool permissions.
Help the user extract their content from a closed web platform.
Exports WordPress pages, posts, custom posts to portable local packages with builder data, media, markdown previews. Imports with smart ID remapping. Auto-backups before AI edits.
Automates WordPress management: posts, pages, WooCommerce products/orders/inventory, comments, SEO (Yoast/RankMath), media via REST API/WP-CLI. Bulk operations, multi-site, health checks, markdown to Gutenberg.
Executes Webflow CMS migrations from WordPress, Contentful, Strapi, CSV/JSON using Data API v2 bulk endpoints, data mapping, validation, and strangler fig pattern.
Share bugs, ideas, or general feedback.
Help the user extract their content from a closed web platform.
liberate_detect to identify the platformliberate_discover to inventory the site — show the counts and platform features to the user
platformFeatures — flags for stores, bookings, forms, members areas, scheduling, forums, and eventstransferable: true (like stores) are handled during extractiontransferable: false include a wpRecommendation with a suggested WordPress pluginliberate_extract with an appropriate outputDirliberate_verify on the outputDir to check the extraction quality — report stale CDN URLs, failed pages, failed media, and quality scoresimport-liberated-data skill, or wp_cli tool): call liberate_setup with delegate: true, then call liberate_import with delegate: true to get a structured import manifest. Hand off to the environment's import skill/tool.liberate_setup with site/username/token to validate the REST API connection, then call liberate_import with REST API credentialsIf the user asks to resume a previous extraction (e.g. "resume", "continue where I left off", "it crashed"):
liberate_extract with resume: true — this skips already-processed URLs.discovery-complete exists), skip straight to reporting results and offer to importThe resume flag causes the extraction to:
extraction-log.jsonl)Any platform may have e-commerce products. When products are detected during extraction:
products.jsonl during extraction, then compiled into products.csv (WooCommerce import format) alongside the WXRIf you encounter something notable during extraction — a new API endpoint, a platform quirk, a workaround for blocked content, a better extraction technique — add an entry to DISCOVERIES.md at the top of the repo. Follow the format in the existing entries. This is how the tool gets smarter over time.
After extraction completes, always run liberate_verify on the output directory. This checks:
Show the user the verification report and flag anything that needs attention before importing.
If the environment provides an import skill (e.g. import-liberated-data in WordPress Studio), use delegate: true with both liberate_setup and liberate_import. The setup call returns requirements, the import call returns a structured manifest with file paths. Hand off to the environment's import skill to execute the actual import.
If no environment import skill is available, use the built-in REST API import. Validate the WordPress connection with liberate_setup first:
liberate_importAsk the user about authors:
importAuthors: true to liberate_import — this creates WordPress user accounts for each author found in the WXR and assigns posts to themimportAuthors: false (default) — all content is owned by the authenticated userIf the user doesn't have a WordPress site yet, guide them:
liberate_setup to validate the connectionSquarespace sites benefit significantly from admin extraction via CDP. Without it, you only get public content — no drafts, no unlisted pages, and Squarespace 7.1 fluid engine sites often return empty content from the ?format=json API.
Guide the user through admin setup:
google-chrome --remote-debugging-port=9222
(On macOS: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222)--cdp-port 9222 (CLI) or cdpPort: 9222 (MCP)The admin session gives the adapter access to:
__NEXT_DATA__ hydration payloads on 7.1 sitesAlways offer CDP-based extraction for Squarespace. Public-only extraction works but produces lower quality results.
Extraction uses Playwright (headless browser) to intercept Wix's internal API calls and extract window globals. This is slower but captures content that isn't available via HTTP alone. Large sites may take several minutes.
Webflow requires a Webflow API token. Ask the user for their token and pass it via --token (CLI) or the token parameter (MCP).
Shopify has two extraction tiers. Always offer the richer one first and fall back only if the user can't produce an Admin API token.
Tier 1 — Public JSON API (no credentials)
Works for any public Shopify storefront. Pulls pages, blog posts, and products via the public /pages.json, /blogs.json, and /products.json endpoints plus HTML fallback for theme-rendered content. No token needed. Product data is limited to what the public API exposes — you lose compareAtPrice sale semantics, real stock policy, cost of goods, variant images, and collections.
Tier 2 — Admin GraphQL (richer product data)
When the user has admin access to their store, offer to use Shopify's Admin GraphQL API. This yields:
compareAtPrice → proper sale/regular price mapping on simple + variable productsinventoryPolicy + inventoryItem.tracked → real stock status (oversell-aware)inventoryItem.unitCost → cost of goods written to meta:_wc_cog_costinventoryItem.measurement.weight → unit-normalized weight (kg)meta:_yoast_wpseo_title / _yoast_wpseo_metadesc)Guide the user through admin setup:
read_products (required)read_inventory (for cost-of-goods + stock)read_online_store_pages / read_online_store_navigation (for pages)read_content (for blog articles)adminToken (MCP) or via the adapter opts. You do not need to ask the user for the shop domain — liberate_discover auto-detects the *.myshopify.com hostname from the storefront HTML (Shopify.shop JS global) and stores it as inventory.shopDomain, even for sites served on custom domains.When to use which tier:
shop.brand.com) → Tier 2 still works because of auto-detection; do NOT ask them for the myshopify.com subdomain manually unless the detector failedIf liberate_discover did not populate inventory.shopDomain (rare — the site may be behind Cloudflare or heavy bot protection that blocks HTML fetch), ask the user directly:
"I couldn't auto-detect the myshopify.com subdomain. Can you paste the URL you see when you log into your Shopify admin? It looks like https://admin.shopify.com/store/<name> — the <name> is what I need."
Pass the admin-resolved value as shopDomain alongside adminToken.
GraphQL failures fall back to Tier 1 automatically — if the token is wrong or the scopes are insufficient, the adapter logs a warning and continues with the public JSON path, so the user's extraction still produces output.
Public-crawl adapter for GoDaddy's legacy Websites & Marketing platform (also called "Go Daddy Website Builder" in page sources). Not to be confused with the newer Airo AI Builder.
GoDaddy offers no data export from W+M — this adapter rescues content by crawling the public site. Detection looks for the Go Daddy Website Builder generator meta tag, the img1.wsimg.com/isteam/ CDN pattern, and the X-SiteId header.
Discovery fetches the three standard W+M sub-sitemaps individually so blog posts can be tagged precisely (W+M's /news,-updates/f/<slug> URL shape doesn't match the generic classifier):
sitemap.website.xml — pagessitemap.blog.xml — blog postssitemap.ols.xml — products (v1.1, not yet implemented)Blog post bodies are hydrated client-side from a window._BLOG_DATA JSON blob. The adapter parses this blob and converts the Draft.js ContentState (post.fullContent) into HTML — preserving paragraphs, headings, lists, blockquotes, code blocks, links, and images. Title, publish date, categories, and featured image are also pulled from _BLOG_DATA rather than HTML meta tags (higher fidelity).
Pages use DOM-based extraction: strip HEADER_SECTION, FOOTER_*, cookie banners, and the first-section title/image widgets (*_SECTION_TITLE_RENDERED, *_IMAGE_RENDERED0) which would otherwise duplicate the <wp:post_title> and media attachment.
v1 limitations: No GoDaddy Online Store (OLS) product extraction yet — sites with a store are flagged, but products need a real store URL for testing before v1.1 ships.
products.csv (WooCommerce format) and products.jsonl are also produced