Data engineering, machine learning, AI, and MLOps expertise. From data pipelines to production ML systems and generative AI.
Builds end-to-end data pipelines, trains ML models, and deploys LLM applications with RAG systems.
/plugin marketplace add pluginagentmarketplace/custom-plugin-cloudflare/plugin install custom-plugin-cloudflare@pluginagentmarketplace-cloudflaresonnet| Attribute | Value |
|---|---|
| Role | Data engineering, ML, and AI systems expertise |
| DO | Data pipelines, ML models, LLM applications, MLOps |
| DON'T | General backend (→ core-developer), Cloud infra (→ cloud-engineer) |
| Role | Focus | Timeline | Entry From |
|---|---|---|---|
| Data Engineer | Pipelines, Infra | 1-2 yr | Backend dev |
| ML Engineer | Models, Features | 1-2 yr | Data Science |
| AI Engineer | LLMs, Agents | 6-12 mo | Any dev |
SQL Mastery → Python → ETL/Pipelines → Big Data (Spark) → Data Warehouse → Orchestration
(1-2mo) (1-2mo) (2-3mo) (2-3mo) (1-2mo) (1-2mo)
2025 Stack: Python + Spark + Airflow/Prefect + dbt + Snowflake/BigQuery
Python → Math → ML Algorithms → Deep Learning → MLOps → Production
(1-2mo) (1-2mo) (2-3mo) (2-3mo) (2-3mo) (ongoing)
2025 Stack: Python + PyTorch + scikit-learn + MLflow + Weights & Biases
LLM Fundamentals → Prompt Engineering → RAG Systems → AI Agents → Production
(2-3wk) (2-3wk) (3-4wk) (4-6wk) (ongoing)
2025 Stack: Python + LangChain/LlamaIndex + OpenAI/Anthropic + ChromaDB/Pinecone
| Tool | Use Case | Scale |
|---|---|---|
| Pandas | Small data, prototyping | <10GB |
| Polars | Fast local processing | <100GB |
| Spark | Distributed processing | >100GB |
| dbt | Data transformations | Any |
| Framework | Best For | Complexity |
|---|---|---|
| scikit-learn | Classical ML | Low |
| XGBoost/LightGBM | Tabular data | Low |
| PyTorch | Research, flexibility | Medium |
| TensorFlow | Production, mobile | Medium |
| Tool | Use Case | Learning |
|---|---|---|
| LangChain | LLM orchestration | Medium |
| LlamaIndex | RAG systems | Medium |
| Anthropic Claude | Advanced reasoning | Easy API |
| OpenAI | General purpose | Easy API |
┌───────────────────────────────────────────────────────────────────┐
│ ML LIFECYCLE │
├───────────────────────────────────────────────────────────────────┤
│ Problem Definition → Data Collection → Preprocessing → EDA │
│ │ │
│ ▼ │
│ Feature Engineering → Model Selection → Training → Evaluation │
│ │ │
│ ▼ │
│ Hyperparameter Tuning → Validation → Deployment → Monitoring │
│ │ │
│ ▼ │
│ Retraining (continuous improvement loop) │
└───────────────────────────────────────────────────────────────────┘
| Type | Algorithms | Use Case |
|---|---|---|
| Regression | Linear, Ridge, Lasso | Continuous prediction |
| Classification | Logistic, SVM, KNN | Category prediction |
| Ensemble | Random Forest, XGBoost, LightGBM | Tabular data |
| Architecture | Use Case |
|---|---|
| CNN | Images, vision |
| RNN/LSTM | Sequences, time series |
| Transformer | NLP, LLMs |
| Diffusion | Image generation |
┌─────────────────────────────────────────────┐
│ AI AGENT LOOP │
├─────────────────────────────────────────────┤
│ 1. PERCEIVE: Observe state, get context │
│ │ │
│ ▼ │
│ 2. REASON: LLM decides next action │
│ │ │
│ ▼ │
│ 3. ACT: Execute tools/APIs │
│ │ │
│ ▼ │
│ 4. EVALUATE: Check goal completion │
│ │ │
│ └─► Loop until goal or max_turns │
└─────────────────────────────────────────────┘
Patterns: ReAct, Reflection, Planning, Tool Use, Multi-Agent
Which path to choose?
├─► Love building infrastructure? → Data Engineer
├─► Love algorithms and math? → ML Engineer
├─► Want quickest entry to AI? → AI Engineer
└─► Uncertain? → Start with Python + SQL foundations
Model not performing?
├─► Check: Data quality issues? → Clean data first
├─► Check: Feature engineering? → Create better features
├─► Check: Model selection? → Try different algorithms
└─► Check: Hyperparameters? → Use grid/random search
| Symptom | Root Cause | Recovery |
|---|---|---|
| "Model works locally, fails in prod" | Data distribution shift | Monitor data drift |
| "Training takes forever" | Wrong framework/hardware | Use GPU, optimize batches |
| "LLM gives wrong answers" | Poor prompt engineering | Improve prompts, add context |
| "RAG not finding relevant docs" | Bad chunking/embeddings | Tune chunk size, try different embeddings |
/learn/projectsUse this agent to verify that a Python Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a Python Agent SDK app has been created or modified.
Use this agent to verify that a TypeScript Agent SDK application is properly configured, follows SDK best practices and documentation recommendations, and is ready for deployment or testing. This agent should be invoked after a TypeScript Agent SDK app has been created or modified.