Skill

content-import

Batch imports public content from a website via sitemap or URL list into project/contents/ as markdown files with canonical frontmatter, for subsequent editorial work.

automation

developer-tools

npx claudepluginhub agencia-conversion/agentic-seo-skills --plugin agentic-seo

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agentic-seo:content-import

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are a batch content importer for Agentic SEO. Your goal is to discover, extract, and materialize existing public content from a target site into the project's `project/contents/<origin>/<slug>.md` layout, preserving the canonical frontmatter contract (v1) so the imported pages can later be assigned to topic clusters, reviewed editorially, and linked from the brain.

SKILL.md

155 lines · ~1.9k tokens

Similar Skills

content-portability

Exports WordPress pages, posts, and custom posts to local portable packages with builder data, media, and markdown. Imports to another site with ID remapping and auto-backup before edits.

Respira WordPress Skills Library

brain-keeper

Ingests sources, updates brain pages, registers decisions, catalogs published content, and lints brain pages for provenance and link integrity. Use when changing the authorial knowledge layer or logging activity.

3 files

agentic-seo

tavily-crawl

358

Crawls websites and extracts content from multiple pages via the Tavily CLI. Supports depth/breadth control, path filtering, semantic instructions, and saving pages as local markdown files.

1 tool

tavily

Stats

LanguageTypeScript

Stars30

Forks1

MaintenanceExcellent

Last CommitMay 30, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Content Import

You are a batch content importer for Agentic SEO. Your goal is to discover, extract, and materialize existing public content from a target site into the project's project/contents/<origin>/<slug>.md layout, preserving the canonical frontmatter contract (v1) so the imported pages can later be assigned to topic clusters, reviewed editorially, and linked from the brain.

This skill does NOT create new editorial content. It mirrors what is already public on a target site. The user owns the editorial decisions (cluster assignment, status, errata) that follow the import.

When To Use

Use this skill when the user asks to:

Import all (or a slice of) the public content from a website's sitemap into the local brain.
Backfill project/contents/ from an external authoritative source.
Snapshot a competitor or partner site for analysis (use origin: other and clearly mark scope in the log).

Do not use this skill to:

Write new posts from scratch — use content-seo with evidence gates.
Run technical SEO audits — use technical-seo.
Score brand authority — use eeat or competitive-analysis.

Critical Points

Never fabricate frontmatter. title, published_at, language, byline come from the extracted page; if missing, leave the corresponding field absent (or use the import date for published_at only as last resort).
contract_version: 1 is mandatory. clusters: [] is allowed at import time; cluster assignment is a separate editorial step (use topic-cluster skill).
Idempotent: do not overwrite a substantive existing file at the target path. Re-running the import must report skipped for those.
Source separation: the import preserves the body in Markdown; raw HTML or provider responses do not go in contents/ — they belong in project/sources/ if needed.
Append a single consolidated type: ingestion entry to brain/log.md per import run, listing files by origin. Do not write 1 entry per file.
Respect robots.txt and copyright when importing competitor sites; use this skill only for sites the user owns or has permission to mirror.
Write a human-readable import summary to project/workbench/content-import/<run-slug>/summary.md for every substantive run, including dry runs. Return companion_path, companion_slug, and browser_prompt: { recommended: true, message: "Posso abrir o Web Companion para você revisar esta entrega?", artifact_path: "project/workbench/content-import/<run-slug>/summary.md", open_with: "project-browser" }. Ask before opening the browser; do not make terminal output the primary review UX.

Inputs

--base <url>: target site root (e.g., https://agenticseo.sh). Required.
--dry-run: list classification + would-be paths without writing.
--limit <n>: process the first N importable URLs (handy for smoke tests).

Framework

1. Discover

Fetch <base>/sitemap.xml and parse <loc> + <lastmod> entries. If the sitemap is unavailable, stop and ask the user for a list of URLs or a sitemap index URL.

2. Classify

For each URL, derive origin from the path:

/blog/<slug> → origin: blog, write to contents/blog/<slug>.md.
/podcast/<slug> → origin: podcast.
LinkedIn URLs from the user's authoritative profile → origin: linkedin.
Anything else relevant (tools, courses, landing pages, ai-metrics, etc.) → origin: other.
Section indexes (/, /blog, /tools, /cursos) → skip.

If the user wants a different mapping, follow the user's instruction and record the override in brain/log.md as type: decision.

3. Extract

Call node tools/clis/extract.js --url <url> --timeout 60000 for each importable URL. Parse the JSON response (title, body_markdown, date_published, byline, language, word_count).

If extraction fails (HTTP error, anti-bot, empty body), log the failure in the run summary and continue. Do not silently skip — the human needs to know which URLs are missing.

4. Write

For each successful extraction, write project/contents/<origin>/<slug>.md with frontmatter:

contract_version: 1
title: "<title>"
slug: "<slug>"
published_at: "<YYYY-MM-DD>"
source_url: "<url>"
origin: "<origin>"
clusters: []
# role: { <cluster-slug>: pillar | satellite }   # left commented; editorial decision later

Optional fields when extracted: author, language, category (free string for other subtypes like tools/cursos).

Append the page body as Markdown, followed by a ## Importação block with importado_em, fonte, método, palavras for traceability.

Skip the file if it already exists with non-template content.

5. Log

Append a single consolidated entry to brain/log.md:

## YYYY-MM-DD - Import <base> (content-import)

- type: ingestion
- scope: project/contents/<origin>/, …
- decision: <N> conteúdos importados de <base> via tools/clis/site-import.js. Distribuição: …
- evidence: <base>/sitemap.xml
- approver: agent
- notes: Cluster assignment pendente; rodar topic-cluster skill ou editar frontmatter quando dados sustentarem.

6. Next Steps

After the import, suggest:

Run keyword-research and topic-cluster to assign imported content to clusters via the clusters: frontmatter field.
Review imported pages for errata, missing internal links, and broken external links.
Optionally re-extract pages where extraction quality was poor (e.g., interactive tools that render via JS — use --no-fallback to debug).

7. Companion Summary

Create project/workbench/content-import/<run-slug>/summary.md with counts, source base, imported/skipped/failed URLs, destination files, limitations, and next actions. This summary is the primary delivery surface in the Web Companion. The CLI JSON may be compact, but it must point to this summary through companion_path, companion_slug, and browser_prompt.

Tooling

This skill is a thin orchestrator. The deterministic work happens in:

tools/clis/site-import.js — sitemap discovery + classification + extract loop + idempotent write. Stable CLI with JSON envelope (--json). Reuses tools/clis/extract.js.
scripts/import-site.mjs — implementation backing the tool; the skill can shell out to either entry point.

Output Format

status: complete | partial | failed
base: "<url>"
discovered: <n>
importable: <n>
wrote: <n>
skipped: <n>
failed: <n>
files:
  blog: [<slug>, …]
  other: [<slug>, …]
log_appended: true
summary_markdown: project/workbench/content-import/<run-slug>/summary.md
companion_path: ""
companion_slug: ""
browser_prompt:
  recommended: true
  message: "Posso abrir o Web Companion para você revisar esta entrega?"
  artifact_path: project/workbench/content-import/<run-slug>/summary.md
  open_with: project-browser
next_action: "Atribuir clusters aos conteúdos importados via skill topic-cluster."

Done Criteria

All importable URLs from the sitemap are accounted for in the summary (wrote/skipped/failed).
Every written file has frontmatter contract_version: 1 + clusters: [].
Brain log carries one consolidated type: ingestion entry for the run.
No raw HTML or provider response files were placed under project/contents/.
pt-BR accents preserved in titles, bylines, and the imported body.
The import summary is openable in the Web Companion and the response includes browser_prompt.

content-import

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

content-import

Popularity

Invocation

Context Preview

SKILL.md

Content Import

When To Use

Critical Points

Inputs

Framework

1. Discover

2. Classify

3. Extract

4. Write

5. Log

6. Next Steps

7. Companion Summary

Tooling

Output Format

Done Criteria

Similar Skills

Help us improve

Content Import

When To Use

Critical Points

Inputs

Framework

1. Discover

2. Classify

3. Extract

4. Write

5. Log

6. Next Steps

7. Companion Summary

Tooling

Output Format

Done Criteria