ML Workbench for Claude Code. Full ML lifecycle: search papers across 7 academic sources, discover and download datasets from 5 repositories, explore and clean data, engineer features, train models (Naive Bayes, KNN, LDA/QDA, SVM, Decision Trees, Ensembles, GLM, Gaussian Process, Neural Networks), run autonomous experiments, build AI apps with LLMs and RAG, build MCP servers, deploy models with Docker and CI/CD, detect drift, explain predictions with SHAP, generate podcasts from papers, manage notebooks, extract YouTube content, and learn ML interactively with 3 university-grade courses (CS229, Applied ML, ML Engineering). 11 agents, 16 skills, 3 CLI tools (mlx-exp, mlx-search, mlx-status), 1 MCP server, 3 output styles, Python LSP via pyright.
npx claudepluginhub damionrashford/mlx --plugin mlxBuilds AI-powered applications using pre-trained models, LLM APIs, embeddings, RAG pipelines, and agent architectures. Knows the Claude Agent SDK, OpenAI Agents SDK, Vercel AI SDK, and DSPy — and fetches their live docs before scaffolding agent code. Use proactively when the user wants to build an AI application, set up a RAG system, do prompt engineering, integrate LLM APIs, build an agent with any framework, work with embeddings/vector stores, optimize prompts with DSPy, or evaluate LLM outputs.
Answers business questions with data through descriptive statistics, hypothesis testing, segmentation, trend analysis, and visualization. Use proactively when the user wants to understand what happened in the data, compare groups, find trends, create charts or dashboards, run A/B test analysis, segment customers, calculate KPIs, or build data reports for stakeholders.
Builds and maintains data pipelines, warehouses, and lakehouses. Use proactively when the user needs to build an ETL/ELT pipeline, set up dbt transformations, implement incremental loading, orchestrate workflows with Airflow or Prefect, process data at scale with Spark, Polars, or DuckDB, design a data lakehouse (Delta Lake, Iceberg, Hudi), validate data quality with Great Expectations or Soda, or set up production data infrastructure feeding ML systems. Distinct from data-scientist (exploratory modeling) and data-analyst (BI and reporting).
Full-pipeline data science agent: dataset discovery, EDA, cleaning, feature engineering, training, and evaluation. Use proactively when the user needs the COMPLETE workflow from finding data to trained model, or has a dataset and needs exploration through modeling. Always starts with data understanding.
Specializes in neural network architecture design, training dynamics, and GPU optimization. Use proactively when the user needs to design or debug a neural network architecture (CNNs, Transformers, RNNs, SSMs, diffusion models), troubleshoot loss curves or gradient pathologies (vanishing gradients, exploding gradients, dead ReLU), implement distributed training (DDP, FSDP, DeepSpeed), optimize GPU memory and throughput (mixed precision, gradient checkpointing, torch.compile), or run systematic architecture search experiments. Distinct from ml-engineer (tabular/classical ML) and ai-engineer (pre-trained model integration).
Specialized model optimization agent for deep, systematic experimentation. Use proactively when the user already has explored and cleaned data and wants focused iteration: feature engineering, model selection, hyperparameter tuning, ablation studies.
Handles everything after model training: serialization, serving code, containerization, CI/CD pipelines, monitoring, model cards, and reproducibility packaging. Use proactively when the user has a trained model and wants to deploy it, serve it, containerize it, create an inference pipeline, write a model card, set up monitoring, or package for reproducibility.
Searches, fetches, synthesizes, and reviews ML/AI research papers, discovers and downloads datasets, extracts YouTube content, generates podcasts and media from papers, then optionally prototypes algorithms. Use proactively when the user wants to find papers, survey a research topic, compare methods, review a paper's methodology, critique experimental design, turn a paper into code, find and download datasets, generate a podcast from a paper, extract a YouTube transcript, or create audio/video summaries of research.
Reviews ML code and experiments for data leakage, reproducibility issues, train/eval separation, hardcoded paths, results.tsv hygiene, and deployment safety. Use when you want a rigorous ML code review, audit for data leakage, check for reproducibility violations, or verify an experiment is sound before promoting it.
Interactive ML education agent that teaches, quizzes, and evaluates understanding across three course tracks (CS229, Applied ML, ML Engineering). Use when the user wants to learn ML concepts, study a topic interactively, get quizzed, practice explaining concepts, run mock system design interviews, debug broken ML scenarios, or check their learning progress and next steps.
ML-aware main session agent. Active when the MLX plugin is enabled. Knows the full ML lifecycle, all MLX skills, results.tsv experiment tracking, and when to delegate to specialized subagents for deep multi-step work.
Statistical analysis, hypothesis testing, A/B testing, cohort analysis, segmentation, trend detection, business metrics, pre-delivery validation, and data visualization. Use when the user asks to "analyze this data", "run a statistical test", "compare groups", "find trends", "do A/B test analysis", "segment customers", "calculate KPIs", "validate this analysis", "check my work", "sanity check", "review my numbers", "make a chart", "create a dashboard", "plot the data", "visualize results", or mentions hypothesis testing, cohort analysis, business analytics, data validation, bar charts, line charts, heatmaps, scatter plots, or data storytelling.
Autonomous time-budget experiment loop. Modify a training script, train for a fixed wall-clock budget, evaluate, record, repeat. Inspired by karpathy/autoresearch. Use for overnight architecture search, systematic hyperparameter sweeps, or any iterative model improvement workflow.
Context engineering for building production LLM applications: context window management, degradation patterns, optimization strategies, memory system selection, multi-agent architecture, filesystem context patterns, and tool design principles. Use when building LLM apps, RAG pipelines, AI agents, multi-agent systems, or when designing memory, tool APIs, or context strategies for any language model application.
Explore, clean, and engineer datasets end-to-end: statistical profiling, distribution checks, missing value analysis, duplicate detection, outlier removal, type fixing, encoding, create features, encode categories, transform columns, add rolling windows, build interaction terms, and feature engineering. Supports pandas, polars, and PySpark. Use when the user wants to explore data, profile columns, understand a dataset, clean data, handle missing values, remove duplicates, fix data types, preprocess a dataset before modeling, create features, encode categories, transform columns, add rolling windows, build interaction terms, or do feature engineering.
Detect data drift, concept drift, and model performance degradation in production. Uses PSI, KS-test, and chi-squared for statistical drift, plus evidently and nannyml for automated reports. Use when monitoring a deployed model or comparing training vs production data distributions.
Systematic evaluation of ML models, experiments, and AI system outputs. Multi-dimensional rubrics, LLM-as-judge, bias detection, and structured comparison frameworks. Use when the user asks to "evaluate model performance", "compare models", "build evaluation rubrics", "assess output quality", "detect model bias", or mentions evaluation frameworks, LLM-as-judge, model comparison, or quality assessment.
Explain model predictions with SHAP, LIME, integrated gradients, and permutation importance. Generates summary plots, waterfall charts, and force plots. Use when debugging predictions, auditing for bias, or communicating model behavior to stakeholders.
Fine-tune language models with LoRA, QLoRA, or full fine-tuning. Covers unsloth (4x memory reduction), PEFT, trl SFTTrainer, DPO, instruction tuning with chat templates, dataset preparation, and evaluation. Use when fine-tuning any HuggingFace model on custom data.
Interactive ML education with 3 university-grade courses (CS229 Stanford, Applied ML Python, ML Engineering), 36+ structured lessons, decision frameworks, and interview prep. Supports study, quiz, explain, design, debug, and progress modes. Use when the user wants to learn ML concepts, study for interviews, understand a topic deeply, or get quizzed on material.
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
Extract content from YouTube videos and generate podcasts, video overviews, quizzes, flashcards, reports, and slide decks from research papers using Google NotebookLM. Use when the user wants to extract a YouTube transcript, analyze a video, turn a paper into a podcast, generate an audio summary, create a quiz from a paper, make slides from research, or automate any NotebookLM workflow.
On-demand ML/data science library expert. Use when the user asks how to use any function, class, or method from NumPy, Pandas, scikit-learn, Matplotlib, TensorFlow, Keras, PyTorch, Seaborn, SciPy, statsmodels, XGBoost, LightGBM, Hugging Face Transformers, OpenCV, NLTK, spaCy, Plotly, Dask, PySpark, SQLAlchemy, or Jupyter. Fetches and synthesizes official API docs, parameter reference, and working code examples. Also use when the user asks "how do I do X in pandas/numpy/torch/sklearn", needs to understand a deep learning layer or training loop, asks about NLP pipelines, computer vision transforms, statistical tests, SQL ORM patterns, or big data ops.
Create, clean, organize, optimize, and convert Jupyter notebooks. Build new notebooks from scratch with proper cell structure, cell IDs, and Colab compatibility. Extract reusable functions, add documentation, generate requirements.txt, and convert to scripts. Use when the user wants to create a notebook, clean a notebook, organize cells, extract functions, convert to script, or optimize a notebook for production.
Search, fetch, download, and extract ML/AI research papers from 7 free academic sources. Find and download ML datasets from 5 free sources (HuggingFace, OpenML, UCI, Papers with Code, Kaggle). Review a paper, critique methodology, assess reproducibility, evaluate experimental design. Convert research papers, articles, or technical documents into working code prototypes. Use when the user wants to find papers, search arxiv, get citations, download a PDF, extract text from a paper, find/download datasets, review/critique a research paper, implement a paper, prototype an algorithm, or convert research to working code.
Compress, deploy, and serve trained ML models in production. Covers model compression (quantization, pruning, distillation, ONNX export), inference APIs, containerization, CI/CD pipelines, monitoring, health endpoints, model versioning, and reproducibility packaging. Use when the user has a trained model and wants to reduce its size, deploy it, serve it, containerize it, build an inference API, set up monitoring, write a model card, create a CI/CD pipeline, or package for reproducibility.
Train ML models and iterate systematically with experiment tracking. Full coverage of supervised learning: Naive Bayes, KNN, Discriminant Analysis (LDA/QDA), SVM/SVR, Decision Trees, Ensemble Methods (Random Forest, XGBoost, LightGBM), GLM (Poisson, Gamma, Tweedie), Gaussian Process, Ridge/Lasso/ElasticNet, and Neural Networks (PyTorch). Covers data splitting, cross-validation, metrics, persistence, hyperparameter search, and TSV-based experiment tracking. Use when the user wants to train a model, fit a classifier or regressor, evaluate performance, do cross-validation, run experiments, tune hyperparameters, or compare runs.
Battle-tested Claude Code plugin for engineering teams — 38 agents, 156 skills, 72 legacy command shims, production-ready hooks, and selective install workflows evolved through continuous real-world use
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Manus-style persistent markdown files for planning, progress tracking, and knowledge storage. Works with Claude Code, Kiro, Clawd CLI, Gemini CLI, Cursor, Continue, Hermes, and 17+ AI coding assistants. Now with Arabic, German, Spanish, and Chinese (Simplified & Traditional) support.
Matches all tools
Hooks run on every tool call, not just specific ones
Admin access level
Server config contains admin-level keywords
Executes bash commands
Hook triggers when Bash tool is used
Modifies files
Hook triggers on file write and edit operations
External network access
Connects to servers outside your machine
Uses power tools
Uses Bash, Write, or Edit tools