Help us improve
Share bugs, ideas, or general feedback.
From MLflow Skills
Instruments Python and TypeScript code with MLflow Tracing for observability. Useful when adding tracing to agents, LLM apps, or specific frameworks like LangChain, OpenAI, Gemini, DSPy, CrewAI, or AutoGen.
npx claudepluginhub mlflow/skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/mlflow:instrumenting-with-mlflow-tracingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Based on the user's project, load the appropriate guide:
Dispatches MLflow tasks to the appropriate sub-skill for tracing, evaluation, debugging, and onboarding. Use when the user needs MLflow help but hasn't specified a sub-skill.
Instruments AI applications (LLM apps, agents, RAG pipelines, chatbots) with DeepEval's native tracing for span-by-span visibility in Confident AI's Observatory. Supports framework integrations and manual @observe instrumentation.
Adds LangSmith tracing to Python/JS LLM apps via LangChain auto-tracing, traceable decorators, or OpenTelemetry; queries/export traces with langsmith CLI.
Share bugs, ideas, or general feedback.
Based on the user's project, load the appropriate guide:
references/python.mdreferences/typescript.mdIf unclear, check for package.json (TypeScript) or requirements.txt/pyproject.toml (Python) in the project.
Trace these operations (high debugging/observability value):
| Operation Type | Examples | Why Trace |
|---|---|---|
| Root operations | Main entry points, top-level pipelines, workflow steps | End-to-end latency, input/output logging |
| LLM calls | Chat completions, embeddings | Token usage, latency, prompt/response inspection |
| Retrieval | Vector DB queries, document fetches, search | Relevance debugging, retrieval quality |
| Tool/function calls | API calls, database queries, web search | External dependency monitoring, error tracking |
| Agent decisions | Routing, planning, tool selection | Understand agent reasoning and choices |
| External services | HTTP APIs, file I/O, message queues | Dependency failures, timeout tracking |
Skip tracing these (too granular, adds noise):
Rule of thumb: Trace operations that are important for debugging and identifying issues in your application.
After instrumenting the code, always verify that tracing is working.
Planning to evaluate your agent? Tracing must be working before you run
agent-evaluation. Complete verification below first.
mlflow.search_traces() or MlflowClient().search_traces() to check that traces appear in the experiment:import mlflow
traces = mlflow.search_traces(experiment_ids=["<experiment_id>"])
print(f"Found {len(traces)} trace(s)")
assert len(traces) > 0, "No traces were logged — check tracking URI and experiment settings"
trace = traces.iloc[0]
spans = mlflow.get_trace(trace.trace_id).data.spans
print(f"Trace has {len(spans)} span(s)")
for span in spans:
print(f" - {span.name} ({span.span_type})")
Check these in order:
mlflow.set_tracking_uri(...) called before the agent run? Without this, traces go to a local ./mlruns directory instead of the configured server.mlflow.autolog() or framework-specific mlflow.<framework>.autolog() raise any warnings during setup? Check stderr for patching failures.search_traces() matches the experiment active when the code ran (mlflow.get_experiment_by_name(...) to confirm).For automated validation, use agent-evaluation/scripts/validate_tracing_runtime.py.
Log user feedback on traces for evaluation, debugging, and fine-tuning. Essential for identifying quality issues in production.
See references/feedback-collection.md for:
mlflow.log_feedback()See references/production.md for:
mlflow-tracing)See references/advanced-patterns.md for:
See references/distributed-tracing.md for: