AI/ML and LLM engineering workflows. HuggingFace ecosystem (CLI, datasets, evaluation, jobs, model training, paper publishing, tool building, Trackio). PyTorch patterns. RAG architecture. Claude API best practices. Cost-aware LLM pipelines with intelligent model routing. AI regression testing. Regex-vs-LLM decision framework. Evaluation harness. Data engineering and ML engineering agents. Depends on atum-core.
npx claudepluginhub arnwaldn/atum-plugins-collection --plugin atum-ai-mlLLMOps specialist for productionizing Large Language Model applications — covers the LLM-specific operational layer that traditional MLOps doesn't address: prompt versioning and registries (PromptLayer, Langfuse, LangSmith Hub, Helicone Prompts, custom Git-based registries), prompt evaluation frameworks (Promptfoo with assertions and red teaming, RAGAS for RAG eval with faithfulness/relevancy/context precision, LangSmith datasets, Arize Phoenix LLM eval, OpenAI Evals, DeepEval), LLM observability platforms (Helicone for OpenAI-compatible APIs, Langfuse for tracing + cost, LangSmith for LangChain apps, Arize Phoenix open source, WhyLabs LangKit, Datadog LLM Observability), cost tracking at token level (per-prompt, per-user, per-feature, budget alerts, cost attribution to business value), latency optimization (semantic caching via GPTCache / Vercel AI SDK cache, prompt compression via LLMLingua / SelectiveContext, parallel completion, streaming), gateway and routing (Portkey for fallback + load balancing, LiteLLM as universal gateway, OpenRouter for model selection by cost/perf, Helicone Router, custom routing logic), jailbreak and prompt injection defense (input validation, output classification via Llama Guard 2 / OpenAI Moderation / Lakera Guard, Rebuff for injection detection), structured output enforcement (Outlines, Instructor, BAML, OpenAI structured output mode, Anthropic tool_use, function calling reliability), red teaming (Garak LLM scanner, Promptfoo red team, Lakera Pint, manual adversarial testing), eval-driven development (test-time scaling, eval-as-CI, eval-as-deployment-gate), and the LLM-specific drift (prompt drift, model deprecation, behavior changes between model versions). Use when productionizing LLM applications, debugging hallucinations or quality drops in prod, optimizing LLM costs above $1k/month, designing eval pipelines for LLM features, implementing prompt registries, or migrating between LLM providers. Differentiates from mlops-engineer (classical ML lifecycle: training, serving, drift) by exclusive focus on LLM-as-API consumer concerns (no training, no GPU management) and LLM-specific patterns (prompts, RAG, agents, tool use). Differentiates from prompt-engineer (designing individual prompts) by focus on the operational and infrastructure layer (versioning, monitoring, gateway, eval pipelines).
Machine Learning, Deep Learning, and MLOps specialist for the full ML lifecycle. Covers data engineering (Pandas, Polars, DVC, Hugging Face Datasets), deep learning with PyTorch (CNN/RNN/Transformer/GAN architectures, mixed precision, DDP/FSDP distributed training, custom losses), NLP and LLMs (LoRA/QLoRA fine-tuning, RAG with Chroma/FAISS/Pinecone/Qdrant, prompt engineering, BLEU/ROUGE evaluation), classical ML (scikit-learn), and MLOps (model serving, monitoring, experiment tracking). Use when training models, fine-tuning LLMs, building RAG systems, or designing ML pipelines.
MLOps specialist for productionizing machine learning models — designs end-to-end ML pipelines from data ingestion to model serving with experiment tracking (MLflow, Weights&Biases, Neptune.ai, Comet.ml, ClearML), feature stores (Feast, Tecton, Hopsworks, AWS SageMaker Feature Store), data versioning (DVC, LakeFS, Pachyderm, Delta Lake), model registries (MLflow Model Registry, SageMaker Model Registry, Vertex AI Model Registry, Hugging Face Hub), training orchestration (Kubeflow Pipelines, Metaflow, Flyte, Airflow ML, Prefect, Dagster), distributed training infrastructure (PyTorch DDP, DeepSpeed, FairScale, Horovod, Ray Train), model serving (BentoML, KServe, Seldon Core, NVIDIA Triton Inference Server, TorchServe, TensorFlow Serving, vLLM for LLMs), continuous training pipelines (data drift detection via Evidently / Whylabs / Fiddler / Arize, retraining triggers, A/B testing models, shadow deployments, canary releases), CI/CD for ML (GitHub Actions ML workflows, Jenkins ML, Argo Workflows, deployment via Vertex AI / SageMaker / Azure ML / Databricks ML), GPU cluster management (Kubernetes with NVIDIA operator, Run.ai, SLURM, Ray Clusters, Lambda Cloud), and ML observability (model performance monitoring, prediction logging, business metrics tracking). Use when productionizing ML models, building reproducible training pipelines, setting up experiment tracking, designing feature stores, deploying models to production with monitoring, or troubleshooting drift / degradation. Differentiates from ml-engineer (model architectures, training, fine-tuning) by focus on operational concerns: reproducibility, deployment, monitoring, lifecycle management. Differentiates from devops-expert by ML-specific concerns: data versioning, model versioning, experiment tracking, drift detection.
Prompt engineering specialist for LLM optimization — designs production-grade prompts using systematic techniques (zero-shot, few-shot with curated examples, chain-of-thought CoT, tree-of-thoughts ToT, ReAct reasoning + acting, self-consistency, self-refine, least-to-most decomposition, plan-and-solve, system 2 attention, structured output via JSON schema / XML tags / Pydantic), handles model-specific quirks (Claude prefers XML tags + thinking blocks, GPT-4 prefers Markdown + numbered steps, Gemini prefers structured JSON, open-source models like Llama 3 / Qwen / DeepSeek prefer specific chat templates), evaluates prompts via Promptfoo / LangSmith / Helicone / Weights&Biases Weave / Arize Phoenix / RAGAS, manages prompt versioning (PromptLayer, LangFuse, custom Git-based registries), implements jailbreak prevention (input validation, output filtering, prompt injection detection), optimizes for cost/latency (token counting via tiktoken, context compression via LLMLingua, semantic cache via GPTCache), handles structured output enforcement (Outlines, Instructor, OpenAI structured output mode, Anthropic tool_use, function calling, BAML), and the prompt patterns from Anthropic's prompt engineering guide. Use when designing system prompts, optimizing existing prompts that hallucinate or perform poorly, building prompt evaluation pipelines, A/B testing prompt variants, or migrating prompts between models. Differentiates from ml-engineer (training/fine-tuning/MLOps) and rag-architect (retrieval pipelines) by exclusive focus on the prompt layer — the highest-leverage optimization point in any LLM app.
Retrieval-Augmented Generation (RAG) architect — designs production-grade RAG pipelines from ingestion to retrieval to generation, choosing between vector databases (Pinecone managed serverless, Weaviate hybrid BM25+vector, Qdrant high-throughput open source, Postgres pgvector for SQL+vectors, Milvus for massive scale, ChromaDB for prototyping, MongoDB Atlas Vector Search, Redis Search), embedding models (OpenAI text-embedding-3-large, Cohere embed-v3, BGE-large-en-v1.5, Voyage AI, Mistral Embed, custom fine-tuned), chunking strategies (fixed-size with overlap, semantic chunking, recursive character splitter, document-structure-aware, late chunking), retrieval techniques (dense vector, sparse BM25, hybrid fusion with reciprocal rank fusion RRF, multi-query expansion, HyDE hypothetical document embeddings, query rewriting, parent-child retrieval), reranking (Cohere Rerank v3, BGE reranker, ColBERT v2, cross-encoders), context window management (compression, summarization, map-reduce), retrieval evaluation (recall@k, MRR, NDCG, hit rate, RAGAS framework, faithfulness + answer relevancy + context precision), latency optimization (vector cache, embedding cache, semantic cache, parallel retrieval), and the agentic RAG patterns (ReAct, multi-hop reasoning, query routing, fallback to web search). Use when designing a RAG system from scratch, debugging poor retrieval quality, optimizing latency or costs, choosing between vector DBs, or evaluating an existing RAG implementation. Has access to the official `pinecone` MCP server declared in this plugin's .mcp.json. Differentiates from generic ml-engineer by deep specialization in retrieval architecture, vector indexes, and the entire RAG ingestion-retrieval-generation lifecycle that requires its own expertise distinct from training/fine-tuning.
Regression testing strategies for AI-assisted development. Sandbox-mode API testing without database dependencies, automated bug-check workflows, and patterns to catch AI blind spots where the same model writes and reviews code.
CodeAct (Code-as-Action) agent pattern library — implementation of the CodeAct paradigm by Wang et al. 2024 (Executable Code Actions Elicit Better LLM Agents, ICML 2024) where an LLM agent uses Python code execution as its universal action space instead of structured JSON tool calls. Covers the core CodeAct insight (Python is Turing-complete and composable, while JSON tool calls limit each action to one function call), the architecture (LLM generates Python code, code is executed in a sandbox, output is fed back as observation, loop continues until task done), the key advantages over JSON function calling (composability — chain operations in one action, control flow — if/for/while in one action, error recovery — try/except in one action, math/data manipulation natively, unlimited action space without redefining tools), benchmark gains reported in the paper (CodeAct outperforms JSON tool use by 20% on average across multiple benchmarks like MINT and ToolBench), the sandbox requirement (E2B for cloud sandbox, Daytona for local Docker, Modal for serverless, Pyodide for browser, Jupyter kernel for notebook environments), the security model (sandbox isolation, network restrictions, filesystem restrictions, resource limits, package whitelist), comparison with ReAct (ReAct uses string actions parsed by regex, CodeAct uses Python directly), comparison with function calling (function calling has structured outputs and validation, CodeAct has flexibility and composability), production frameworks (OpenDevin uses CodeAct, Smolagents from Hugging Face has CodeAgent class native, AutoGen has Code Executor agents, LangChain has PythonREPLTool but not full CodeAct), use cases where CodeAct shines (data analysis, math problems, multi-step transformations, web scraping, file operations, scientific computing), and the limitations (security risk if sandbox misconfigured, harder to enforce structured outputs, requires execution environment). Use when an agent needs to perform complex multi-step computations, data manipulations, or operations that don't fit into discrete tool calls. Differentiates from generic ReAct by using executable code as the action layer.
Corrective RAG (CRAG) pattern library — implementation of the Corrective Retrieval Augmented Generation paradigm by Yan et al. 2024 (Corrective Retrieval Augmented Generation, ICLR 2024) which improves classical RAG by adding a retrieval evaluator that grades the relevance of retrieved documents and triggers fallback mechanisms when the retrieval is judged insufficient. Covers the core CRAG flow (retrieve documents from vector store, grade each document via lightweight T5 evaluator or LLM-as-judge with categories Correct/Incorrect/Ambiguous, when Correct use as-is, when Ambiguous combine knowledge refinement with web search, when Incorrect discard and rely on web search), the knowledge refinement step (decompose retrieved documents into strips, filter strips by relevance, re-compose into clean context), the web search fallback (typically using Google Search API, Brave Search, Tavily, Firecrawl, Exa to fetch fresh sources when internal knowledge base fails), benchmark gains reported in the paper (PopQA +20%, Biography +25%, PubHealth +10% over standard RAG), comparison with alternative RAG variants (HyDE for hypothetical document embedding, Self-RAG with self-reflection tokens, Adaptive RAG that decides when to retrieve), implementation strategies (lightweight evaluator vs LLM-as-judge trade-off, web fallback cost management, hybrid local+web context fusion), production considerations (latency added by evaluator step, web API costs, hallucination risk reduction, compliance for web fetching), use cases where CRAG dominates (open-domain QA with risk of stale knowledge base, fact-checking applications, customer support with both static KB and dynamic web sources), and the limitations (overhead of evaluator, dependency on web search quality, complexity vs simple RAG). Use when standard RAG hallucinates due to poor retrieval, when knowledge base coverage is incomplete and web augmentation is acceptable, or when you need a robust fallback mechanism. Differentiates from generic RAG by deep focus on retrieval quality grading and fallback orchestration.
Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
GraphRAG (Graph-based Retrieval Augmented Generation) pattern library — implementation of the GraphRAG paradigm popularized by Microsoft Research in 2024 (From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Edge et al.) which constructs a knowledge graph from a corpus by extracting entities and relationships with an LLM, runs community detection on the graph (Leiden algorithm), generates hierarchical community summaries, and then routes queries to either local search (around specific entities) or global search (across community summaries) depending on the query intent. Covers the core GraphRAG pipeline (LLM entity extraction with custom prompts, relationship extraction with type classification, graph construction in Neo4j or NetworkX, Leiden community detection at multiple resolutions, hierarchical summarization of communities by LLM, query classification as local or global, local search via entity-centric subgraph retrieval, global search via community summary aggregation), the production frameworks (Microsoft GraphRAG official implementation in Python, LightRAG by HKU as a faster alternative, FalkorDB for graph storage with claimed -90% hallucinations vs vector RAG, Neo4j GraphRAG library, ms-graphrag-mcp for Claude Code integration), benchmark gains for query-focused summarization (Microsoft paper shows comprehensiveness +72%, diversity +62% over vector RAG baseline), comparison with vector RAG (vector RAG is best for fact lookup with dense semantic similarity, GraphRAG is best for global understanding requiring relationship traversal), comparison with hybrid RAG (vector + sparse), use cases where GraphRAG dominates (research synthesis, executive summaries, narrative understanding, multi-document reasoning, scientific literature review, legal case law), use cases where it underperforms (single-fact lookup, simple Q&A, low-latency requirements), implementation considerations (extraction quality is critical, LLM cost for entity extraction is high, graph maintenance as corpus evolves, query routing accuracy), and the limitations (cost of graph construction, latency, graph schema design challenges, less mature than vector RAG ecosystem). Use when standard RAG fails on global summarization queries, when relationships between entities matter more than semantic similarity, or when building research/analytics applications. Differentiates from rag-architect (vector-centric) by deep focus on graph-based knowledge representation.
Execute Hugging Face Hub operations using the `hf` CLI. Use when the user needs to download models/datasets/spaces, upload files to Hub repositories, create repos, manage local cache, or run compute jobs on HF infrastructure. Covers authentication, file transfers, repository creation, cache operations, and cloud compute.
Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.
Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.
This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.
Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.
Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API) or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, HF Space syncing, and JSON output for automation.
Pinecone vector database pattern library — leverages the official Pinecone MCP server (declared in this plugin's .mcp.json via @pinecone-database/mcp) for index management (Serverless vs Pod-based, dimension choice, metric cosine/dotproduct/euclidean), namespace isolation for multi-tenancy, metadata filtering with hybrid search, sparse-dense vectors for hybrid BM25+vector retrieval, integrated inference (Pinecone-hosted embedding models for one-step upsert), Pinecone Assistants (managed RAG pipelines without writing code), batch upsert patterns + parallel writes, query patterns (top_k tuning, includeValues, includeMetadata), reranking with Pinecone Rerank API, monitoring + alerts via Pinecone console, capacity planning (read/write units for serverless), backup + replication strategies, and migration from Pod-based to Serverless. Use when building any RAG system on Pinecone, migrating from another vector DB to Pinecone, debugging slow queries, optimizing index costs, or implementing multi-tenant isolation. Mentions the official `pinecone` MCP server — Claude Code can directly create indexes, upsert records, search, and rerank documents at runtime via the official Pinecone MCP.
PyTorch deep learning patterns and best practices for building robust, efficient, and reproducible training pipelines, model architectures, and data loading.
Qdrant vector database pattern library — open-source vector DB written in Rust for high throughput and low latency, self-hosted or Qdrant Cloud, collections + points + payloads (metadata) architecture, HNSW indexing with tunable parameters (m, ef_construct, ef), exact vs approximate search, payload indexing for fast filtering (keyword, integer, float, geo, datetime, text), distance metrics (Cosine, Dot, Euclidean, Manhattan), quantization (Scalar / Product / Binary) for 4-32x storage reduction, named vectors for multi-perspective embeddings, sparse vectors for hybrid retrieval, multi-tenancy via payload isolation or separate collections, snapshots for backup, replication factor for HA, sharding for horizontal scale, gRPC + HTTP REST APIs, Qdrant Web UI for visual exploration, integrations with LangChain / LlamaIndex / Haystack, and migration from Pinecone or Weaviate. Use when needing maximum throughput on commodity hardware, self-hosting requirement (data sovereignty, on-prem), tight cost control (open source vs Pinecone managed), or building Rust-native applications. Differentiates from Pinecone by self-hostable + open source + lower cost at scale, and from Weaviate by raw performance + simpler API + no GraphQL overhead.
Use when building RAG systems, vector databases, or knowledge-grounded AI applications requiring semantic search, document retrieval, or context augmentation.
ReAct (Reasoning + Acting) agent pattern library — implementation of the ReAct paradigm by Yao et al. 2022 (ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023) where an agent alternates between Thought (reasoning step in natural language), Action (tool call or environment interaction), and Observation (result feedback) in an explicit loop. Covers the core loop structure (Thought→Action→Observation→Thought→...), prompt template design (system prompt with ReAct format instructions, scratchpad accumulation, action parser), action space definition (tool registry, tool descriptions in JSON Schema or natural language, action validation), observation handling (tool output parsing, error recovery, observation truncation for long outputs), termination conditions (final answer detection, max iterations, confidence threshold), comparison with alternative agent patterns (CoT pure for non-tool tasks, function calling JSON for structured tool use, CodeAct for code-as-action, Reflexion for self-correction), production frameworks that implement ReAct (LangChain AgentExecutor, LlamaIndex ReActAgent, Haystack Agents, Smolagents from Hugging Face, AutoGen ReAct, CrewAI), Claude/GPT-specific ReAct prompt patterns, debugging ReAct loops (loop detection, hallucinated tools, infinite loops), and the limitations of ReAct (no parallelism, latency cost per step, error propagation, prompt verbosity). Use when implementing tool-using agents, building autonomous research agents, debugging existing ReAct implementations, or choosing between agent patterns. Differentiates from generic agent skills by deep focus on the ReAct-specific loop mechanics and the prompt templates that make it work reliably.
Reflexion (verbal reinforcement learning) pattern library — implementation of the Reflexion paradigm by Shinn et al. 2023 (Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023) where an LLM agent improves iteratively by reflecting on its own failures in natural language and storing those reflections in a memory buffer for the next attempt. Covers the core Reflexion architecture (Actor that generates trajectories, Evaluator that scores outcomes binary or scalar, Self-Reflection module that converts failures into verbal lessons, Memory buffer that persists reflections across trials), the trial loop (Generate trajectory → Evaluate → Reflect → Store → Retry with reflections in context), comparison with classical RL (verbal feedback instead of gradient updates, no model weight changes, instant feedback loop), comparison with self-correction (Reflexion uses persistent memory across trials, simple self-correction is single-shot), benchmark gains reported in the paper (HumanEval coding 91% vs 80% baseline, AlfWorld decision-making 85% vs 75%, HotPotQA QA 56% vs 50%), implementation strategies (binary reward vs scalar reward, reflection prompt design, memory consolidation when buffer fills, max trials limit), use cases where Reflexion excels (coding tasks with test feedback, multi-step tool use with eval signal, agentic workflows with success/failure outcomes), use cases where it fails (no clear success signal, single-turn tasks, creative tasks without ground truth), production frameworks (LangChain Reflexion templates, custom implementations on top of any agent framework), evaluation methodology (track improvement curve across trials, measure reflection quality, detect divergence), and the limitations (cost multiplied by N trials, latency, divergence risk, context overflow as memory grows). Use when an agent fails on tasks but has access to feedback signal, when iterative refinement could help, or when classical fine-tuning is too expensive. Differentiates from CoT or ReAct (single-pass reasoning) by explicit multi-trial loop with verbal memory.
Decision framework for choosing between regex and LLM when parsing structured text — start with regex, add LLM only for low-confidence edge cases.
Setup Sentry AI Agent Monitoring in any project. Use this when asked to add AI monitoring, track LLM calls, monitor AI agents, or instrument OpenAI/Anthropic/Vercel AI/LangChain/Google GenAI. Automatically detects installed AI SDKs and configures the appropriate Sentry integration.
Tree-of-Thoughts (ToT) reasoning pattern library — implementation of the Tree-of-Thoughts paradigm by Yao et al. 2023 (Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023) where an LLM explores multiple reasoning paths in parallel as a search tree, evaluates each branch, and uses BFS or DFS with backtracking to find the best solution. Covers the core ToT structure (problem decomposition into thought steps, multiple thought generation per step via temperature sampling or distinct prompts, state evaluator that scores partial solutions, search algorithm BFS/DFS/beam search), comparison with Chain-of-Thought (CoT generates one linear chain, ToT explores a tree), comparison with Self-Consistency (Self-Consistency samples multiple chains and votes, ToT actively prunes bad branches), use cases where ToT shines (Game of 24, creative writing with constraints, mini crosswords, math word problems, code generation with multiple approaches), use cases where ToT is overkill (simple Q&A, factual lookup, single-step reasoning), implementation strategies (manual tree expansion + LLM scoring, recursive function with memoization, integration with LangGraph for graph-based agent flows), evaluation metrics (success rate, branches explored, total LLM calls, latency), benchmark gains reported in the paper (74% success on Game of 24 vs 4% for CoT, 60% on creative writing vs 28%), production considerations (cost explosion with deep trees, latency, branch pruning heuristics), and the variants (Graph-of-Thoughts by Besta et al. 2023, Algorithm-of-Thoughts by Sel et al. 2023, Skeleton-of-Thoughts by Ning et al. 2023). Use when facing complex multi-step reasoning problems where Chain-of-Thought fails, when multiple solution paths exist and you need to explore them, or when you need backtracking from dead ends. Differentiates from generic prompt engineering by deep focus on tree search structures applied to LLM reasoning.
Weaviate vector database pattern library — open-source vector database with native hybrid BM25 + vector retrieval (the most architecturally coherent hybrid in 2026), GraphQL + REST APIs, schema-first design with classes + properties + cross-references (graph-like relations between vectors), modular vectorizer architecture (text2vec-openai, text2vec-cohere, text2vec-huggingface, text2vec-transformers self-hosted, multi2vec-clip for multimodal text+image), generative search modules (generative-openai, generative-cohere, generative-anthropic for in-database LLM calls without round-trip), multi-tenancy with isolated tenant data + per-tenant indexes, named vectors (multiple embeddings per object for different fields/perspectives), backup + restore via S3 / GCS / Azure Blob, replication for HA, sharding for horizontal scale, Weaviate Cloud Services (WCS) managed vs self-hosted Docker / Helm, and migration from Pinecone or Qdrant. Use when building RAG with hybrid retrieval as core requirement, implementing multimodal search (text + images), needing in-database generation calls, or running on-prem with hybrid search built-in. Differentiates from Pinecone by hybrid BM25 + vector being native (not bolted-on) and from Qdrant by GraphQL API + generative modules.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Requires secrets
Needs API keys or credentials to function
Uses power tools
Uses Bash, Write, or Edit tools
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
Manus-style persistent markdown files for planning, progress tracking, and knowledge storage. Works with Claude Code, Kiro, Clawd CLI, Gemini CLI, Cursor, Continue, and 16+ AI coding assistants. Now with Arabic, German, Spanish, and Chinese (Simplified & Traditional) support.
Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques