From mlflow
Instruments Python and TypeScript code with MLflow Tracing for observability in LLM apps, agents, retrieval, tools, and frameworks like LangChain, OpenAI, LangGraph.
npx claudepluginhub mlflow/skillsThis skill uses the workspace's default tool permissions.
Based on the user's project, load the appropriate guide:
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Based on the user's project, load the appropriate guide:
references/python.mdreferences/typescript.mdIf unclear, check for package.json (TypeScript) or requirements.txt/pyproject.toml (Python) in the project.
Trace these operations (high debugging/observability value):
| Operation Type | Examples | Why Trace |
|---|---|---|
| Root operations | Main entry points, top-level pipelines, workflow steps | End-to-end latency, input/output logging |
| LLM calls | Chat completions, embeddings | Token usage, latency, prompt/response inspection |
| Retrieval | Vector DB queries, document fetches, search | Relevance debugging, retrieval quality |
| Tool/function calls | API calls, database queries, web search | External dependency monitoring, error tracking |
| Agent decisions | Routing, planning, tool selection | Understand agent reasoning and choices |
| External services | HTTP APIs, file I/O, message queues | Dependency failures, timeout tracking |
Skip tracing these (too granular, adds noise):
Rule of thumb: Trace operations that are important for debugging and identifying issues in your application.
After instrumenting the code, always verify that tracing is working.
Planning to evaluate your agent? Tracing must be working before you run
agent-evaluation. Complete verification below first.
mlflow.search_traces() or MlflowClient().search_traces() to check that traces appear in the experiment:import mlflow
traces = mlflow.search_traces(experiment_ids=["<experiment_id>"])
print(f"Found {len(traces)} trace(s)")
assert len(traces) > 0, "No traces were logged — check tracking URI and experiment settings"
trace = traces.iloc[0]
spans = mlflow.get_trace(trace.trace_id).data.spans
print(f"Trace has {len(spans)} span(s)")
for span in spans:
print(f" - {span.name} ({span.span_type})")
Check these in order:
mlflow.set_tracking_uri(...) called before the agent run? Without this, traces go to a local ./mlruns directory instead of the configured server.mlflow.autolog() or framework-specific mlflow.<framework>.autolog() raise any warnings during setup? Check stderr for patching failures.search_traces() matches the experiment active when the code ran (mlflow.get_experiment_by_name(...) to confirm).For automated validation, use agent-evaluation/scripts/validate_tracing_runtime.py.
Log user feedback on traces for evaluation, debugging, and fine-tuning. Essential for identifying quality issues in production.
See references/feedback-collection.md for:
mlflow.log_feedback()See references/production.md for:
mlflow-tracing)See references/advanced-patterns.md for:
See references/distributed-tracing.md for: