HTTP/HTTPS handler for external documentation fetching. Fetches content from web URLs with safety checks and metadata extraction.
Fetches content from HTTP/HTTPS URLs with safety checks, timeout handling, and metadata extraction. Used when external documentation needs to be retrieved from web URLs.
/plugin marketplace add fractary/claude-plugins/plugin install fractary-codex@fractaryThis skill inherits all available tools. When active, it can use any tool Claude has access to.
scripts/fetch-url.shYour responsibility is to fetch content from HTTP/HTTPS URLs with proper safety checks, timeout handling, and metadata extraction. You implement the handler interface for external URL sources.
You are part of the multi-source architecture (Phase 2 - SPEC-00030-03). </CONTEXT>
<CRITICAL_RULES> URL Validation:
Content Safety:
Error Handling:
Security:
IF reference starts with @codex/external/:
IF protocol validation fails:
USE SCRIPT: ./scripts/fetch-url.sh Arguments: { url: validated URL timeout: source_config.handler_config.timeout || 30 max_size_mb: source_config.handler_config.max_size_mb || 10 }
OUTPUT to stdout: Content OUTPUT to stderr: Metadata JSON
IF fetch fails:
Extract from HTTP headers (captured by fetch-url.sh):
IF content_type is text/markdown or contains YAML frontmatter: USE SCRIPT: ../document-fetcher/scripts/parse-frontmatter.sh Arguments: {content} OUTPUT: Frontmatter JSON ELSE: frontmatter = {}
Return structured response:
{
"success": true,
"content": "<fetched content>",
"metadata": {
"content_type": "text/html",
"content_length": 12543,
"last_modified": "2025-01-15T10:00:00Z",
"etag": "\"abc123\"",
"final_url": "https://...",
"frontmatter": {...}
}
}
</WORKFLOW>
<COMPLETION_CRITERIA> Operation is complete when:
Success Response:
{
"success": true,
"content": "document content...",
"metadata": {
"content_type": "text/markdown",
"content_length": 8192,
"last_modified": "2025-01-15T10:00:00Z",
"etag": "\"def456\"",
"final_url": "https://docs.example.com/guide.md",
"frontmatter": {
"title": "API Guide",
"codex_sync_include": ["*"]
}
}
}
Error Response:
{
"success": false,
"error": "HTTP 404: Not found",
"url": "https://docs.example.com/missing.md",
"http_code": 404
}
</OUTPUTS>
<ERROR_HANDLING>
<INVALID_PROTOCOL> If URL uses non-HTTP(S) protocol:
<HTTP_ERROR_404> If HTTP status 404:
<HTTP_ERROR_403> If HTTP status 403:
<HTTP_ERROR_5XX> If HTTP status 500-599:
<SIZE_LIMIT> If content exceeds max_size_mb:
</ERROR_HANDLING>
<DOCUMENTATION> Upon completion, output:🎯 STARTING: handler-http
URL: https://docs.example.com/guide.md
Max size: 10MB | Timeout: 30s
───────────────────────────────────────
✓ URL validated
✓ Content fetched (8.2 KB)
✓ Metadata extracted
✓ Frontmatter parsed
✅ COMPLETED: handler-http
Source: External URL
Content-Type: text/markdown
Size: 8.2 KB
Fetch time: 1.2s
───────────────────────────────────────
Ready for caching and permission check
</DOCUMENTATION>
<NOTES>
## Retry Logic
Server errors (5xx) are retried with exponential backoff:
Client errors (4xx) are NOT retried (they won't succeed).
To prevent repeated fetches of non-existent URLs:
Supported content types:
Phase 3 and beyond: