Automated content discovery workflow for scanning multiple data sources (ArXiv, GitHub, HuggingFace), applying AI-powered semantic filtering and deduplication, analyzing content quality, downloading images, and publishing to multiple channels (Hexo blog, Telegram, Discord). Use this skill when: - Executing /discover command with task configurations - Scanning academic papers from ArXiv (via MCP or WebSearch) - Finding GitHub repositories or HuggingFace models - Applying semantic AI filtering to avoid duplicate content - Generating high-quality Chinese summaries and insights - Publishing structured content with metadata tracking This skill handles the complete end-to-end discovery pipeline including data source detection, content filtering, quality validation, image extraction, and multi-channel publishing.
Automates end-to-end content discovery by scanning sources like ArXiv and GitHub, applying AI semantic filtering and deduplication, then publishing to multiple channels with metadata tracking. Use the `/discover` command to trigger this complete pipeline from search to multi-channel publishing.
/plugin marketplace add longkeyy/claude-discover/plugin install longkeyy-content-discovery@longkeyy/claude-discoverThis skill inherits all available tools. When active, it can use any tool Claude has access to.
自动化内容发现工作流:扫描数据源 → 智能分析 → 发布内容 → 优化关键词
每个发现任务需要以下配置:
读取任务配置 (config/tasks/{task_id}.md)
检测可用工具
读取关键词 (config/keywords/{task_id}.json)
执行搜索
应用过滤规则
AI 语义去重
提取元数据
数据来源追踪(重要)
必须添加 metadata 字段用于数据溯源和二次更新:
{
"metadata": {
"source_url": "<数据采集的原始URL>",
"source_type": "<自动检测: github/huggingface/arxiv/web>",
"collected_at": "<当前时间 ISO 8601>",
"updated_at": "<当前时间 ISO 8601>",
"task_id": "<当前任务ID>",
"source_details": {
// 根据 source_type 添加特定信息
// GitHub: {"repo": "...", "branch": "main"}
// HuggingFace: {"repo_type": "model", "repo_id": "..."}
// ArXiv: {"arxiv_id": "...", "version": "v1"}
}
}
}
自动类型检测规则:
github.com → source_type: "github"huggingface.co → source_type: "huggingface"arxiv.org → source_type: "arxiv"生成内容摘要与观点提炼
质量标准检查(根据任务类型)
Foundation Models:
MCP Servers:
Prompt Papers:
评分与过滤
注意:详见 docs/DATA_SOURCE_TRACKING.md 和 docs/CONTENT_QUALITY_STANDARDS.md
由 AI Agent 直接处理,无需独立脚本
图片发现(AI 分析)
智能选择(AI 决策)
下载到本地(Bash 工具)
# Agent 调用 Bash 工具
source .claude-plugin/scripts/discover/image_utils.sh
# 创建目录
IMAGE_DIR=$(create_image_dir "{task_id}" "{slug}")
# 下载封面
download_image "https://example.com/cover.png" \
"$IMAGE_DIR/cover.png"
# 下载截图
download_image "https://example.com/screenshot-1.png" \
"$IMAGE_DIR/screenshot-1.png"
# 下载架构图
download_image "https://example.com/architecture.svg" \
"$IMAGE_DIR/diagram-1.svg"
# 获取图片尺寸
SIZE=$(get_image_size "$IMAGE_DIR/cover.png")
更新 JSON 数据(AI Agent)
{
"images": {
"cover": {
"original_url": "https://...",
"local_path": "images/{task_id}/{slug}/cover.png",
"alt": "AI生成的描述性文本",
"width": 1200,
"height": 630,
"downloaded": true
},
"screenshots": [
{
"original_url": "https://...",
"local_path": "images/{task_id}/{slug}/screenshot-1.png",
"alt": "...",
"caption": "AI生成的说明",
"downloaded": true
}
],
"diagrams": [...]
},
"featured_image": "images/{task_id}/{slug}/cover.png"
}
错误处理
downloaded: false,保留原始 URL工具集:
.claude-plugin/scripts/discover/image_utils.sh - Bash 辅助函数注意: 详见 docs/IMAGE_MANAGEMENT.md
保存原始数据
读取发布配置
发布到启用的渠道
Hexo博客 (如果 hexo.enabled: true):
featured_image 作为文章头图/images/{task_id}/{slug}/xxx.pngTelegram频道 (如果 telegram.enabled: true):
Discord频道 (如果 discord.enabled: true):
验证发布
内容完整性检查
质量标准验证(根据任务类型)
Foundation Models:
# 检查必需字段
- metadata.source_url 存在
- release_date 不等于 collected_at
- technical_report 字段(如果适用)
- 内容为中文且 ≥ 1500 字符
MCP Servers:
# 检查工具文档
- tools_resources.tools 非空数组
- 每个 tool 包含 name, description
- config_example 存在
- 内容为中文且 ≥ 1500 字符
Prompt Papers:
# 检查论文链接
- arxiv_url 存在且有效
- 内容为中文且 ≥ 1000 字符
失败处理
归档不合格内容
分析新内容
关键词发现
更新关键词文件
每个任务完成后输出执行摘要:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 任务执行完成: {task_id}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
数据源: {使用的工具}
搜索结果: {总数}
过滤后: {数量}
去重后: {数量}
质量检查: {通过/失败}
发布成功: {数量}
质量统计:
• 中文内容: {数量}/{总数} (100% required)
• 符合长度要求: {数量}/{总数}
• 包含必需字段: {数量}/{总数}
任务特定指标:
• [Foundation Models] 有技术报告: {数量}/{总数}
• [MCP Servers] 有工具文档: {数量}/{总数}
• [Prompt Papers] 有arxiv链接: {数量}/{总数}
发布渠道:
• JSON原始数据: {数量}个文件(含 metadata.source_url)
• Hexo博客: {已发布/跳过/失败} (如果enabled)
• Telegram: {已发送/跳过/失败} (如果enabled)
• Discord: {已发送/跳过/失败} (如果enabled)
图片下载:
• 封面: {数量}
• 截图: {数量}
• 架构图: {数量}
新发现关键词: {数量}
质量不合格(已跳过):
• 无工具文档: {数量}
• 内容过短: {数量}
• 非中文内容: {数量}
• 缺少必需字段: {数量}
保存位置:
• JSON: posts/{task_id}/
• 图片: blog/source/images/{task_id}/
• Hexo: {HEXO_PATH}/{post_dir}/ (如果enabled)
• 归档: config/.archived/ (如果有不合格内容)
会话日志: temp/sessions/{task_id}/{session_id}/
💡 二次更新: 所有 JSON 包含 metadata.source_url,可用于未来更新
⚠️ 质量优先: 不符合标准的内容已被跳过或归档
如果执行失败:
可用的辅助脚本:
.claude-plugin/scripts/discover/image_utils.sh - 图片下载和处理.claude-plugin/scripts/discover/update_index.sh - 更新任务索引.claude-plugin/scripts/discover/check.sh - 前置环境检查.claude-plugin/scripts/discover/parse_tasks.sh - 任务解析此 Skill 支持并行处理多个任务。当同时执行多个任务时:
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.