From spotlight
Archives web pages via Wayback Machine, Archive.today, and local scrape for evidence preservation with chain of custody. For investigators and fact-checkers.
How this skill is triggered — by the user, by Claude, or both
Slash command
/spotlight:web-archivingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Adapted from** [jamditis/claude-skills-journalism](https://github.com/jamditis/claude-skills-journalism) by Jay Amditis (MIT License).
Adapted from jamditis/claude-skills-journalism by Jay Amditis (MIT License).
Archive evidence sources as you find them, before they disappear. This skill is for investigators and fact-checkers who need to preserve sources for editorial accountability, legal defensibility, and reproducibility.
Try in order. Stop when you have a confirmed archived copy.
Before any execute-shell call that uses a URL, timestamp, filename, or path, invoke shell-safety and validate values with scripts/spotlight_safe.py. Use curl --get --data-urlencode for URL parameters; do not interpolate untrusted values into shell strings.
Check if already archived:
execute-shell: python3 scripts/spotlight_safe.py validate-url "{URL}"
execute-shell: curl -s --get "https://archive.org/wayback/available" --data-urlencode "url={URL}"
Submit for archiving:
execute-shell: python3 scripts/spotlight_safe.py validate-url "{URL}"
execute-shell: curl -s -I "https://web.archive.org/save/{URL}"
The response Location header contains the new snapshot URL.
Find all snapshots (CDX API):
execute-shell: python3 scripts/spotlight_safe.py validate-url "{URL}"
execute-shell: curl --get "http://web.archive.org/cdx/search/cdx" \
--data-urlencode "url={URL}" \
--data-urlencode "output=json" \
--data-urlencode "limit=5" \
--data-urlencode "fl=timestamp,statuscode,original"
Submit via form:
execute-shell: curl -s -L -X POST "https://archive.ph/submit/" \
-H "Content-Type: application/x-www-form-urlencoded" \
--data-urlencode "url={URL}" \
-d "anyway=1"
Follow redirects — the final URL is the archived copy.
Check for existing copy:
execute-shell: python3 scripts/spotlight_safe.py validate-url "{URL}"
execute-shell: curl -s "https://archive.ph/newest/" --get --data-urlencode "url={URL}"
When archive services are rate-limited or unavailable, scrape directly and save to the evidence store:
fetch: url={URL}, output_path={CASE_DIR}/research/archived/{domain}-{slug}-archived-{YYYYMMDD}.md
Treat local scrapes as lower-confidence preservation — they are not independently verifiable.
All archived copies go to {CASE_DIR}/research/archived/.
Naming convention: {domain}-{slug}-archived-{YYYYMMDD}.md
Examples:
cases/project-name/research/archived/reuters-ukraine-ceasefire-archived-20260315.md
cases/project-name/research/archived/companyreg-gov-uk-filing-archived-20260315.md
Embed this header in every archived file. It is the provenance record.
---
archived_at: 2026-03-15T14:22:00Z
original_url: https://example.com/article/path
archive_url: https://web.archive.org/web/20260315142200/https://example.com/article/path
archive_service: Wayback Machine
archived_by: investigator | fact-checker
case: {project}
---
Without this block, the file is not a valid archived source — it is just a local copy.
Write the archived file with the chain of custody block:
write-file: path={CASE_DIR}/research/archived/{domain}-{slug}-archived-{YYYYMMDD}.md, content=<chain of custody block + page content>
If the original URL returns 404 or is otherwise gone, check Wayback Machine before marking the source as unavailable:
execute-shell: python3 scripts/spotlight_safe.py validate-url "{URL}"
execute-shell: curl --get "http://web.archive.org/cdx/search/cdx" \
--data-urlencode "url={URL}" \
--data-urlencode "output=json" \
--data-urlencode "limit=1" \
--data-urlencode "fl=timestamp,statuscode"
If a snapshot exists, retrieve it:
execute-shell: python3 scripts/spotlight_safe.py validate-timestamp "{TIMESTAMP}"
execute-shell: python3 scripts/spotlight_safe.py validate-url "{URL}"
fetch: url=https://web.archive.org/web/{TIMESTAMP}/{URL}, output_path={CASE_DIR}/research/archived/{filename}.md
Only mark a source as unavailable after checking all three services.
Add archive_url to every source entry in findings.json and fact-check.json:
{
"url": "https://example.com/article",
"type": "news",
"accessed": "2026-03-15T14:20:00Z",
"archive_url": "https://web.archive.org/web/20260315142200/https://example.com/article",
"local_file": "{CASE_DIR}/research/archived/example-article-archived-20260315.md"
}
If the page could not be archived, set archive_url to null and note why in the finding's confidence_rationale.
Adapted from claude-skills-journalism by Jay Amditis, released under MIT License.
npx claudepluginhub buriedsignals/spotlight --plugin spotlightArchives web pages via Wayback Machine, Archive.today, or local scrape for fact-checking and investigations, with chain-of-custody Markdown files.
Generates a step-by-step plan to preserve web pages using Wayback Machine and Archive.today with multi-archive redundancy and legal evidence guidance.
Ingests sources like URLs, YouTube videos, GitHub repos/PRs/issues, PDFs, or text into a vault, creating SHA-256 hashed raw files and Markdown summaries with key claims.