Instruments Python AI agents (LangChain, LangGraph, CrewAI, LlamaIndex, Google ADK) with OpenTelemetry to send traces, logs, metrics to DataRobot for monitoring.
npx claudepluginhub datarobot-oss/datarobot-agent-skills --plugin datarobot-agent-skillsThis skill uses the workspace's default tool permissions.
This skill helps you instrument any AI agent — regardless of framework or deployment environment — to send OpenTelemetry telemetry (traces, logs, metrics) to DataRobot. It also creates a shell deployment in DataRobot as the telemetry routing target.
Evaluates and optimizes LLM agent output using MLflow datasets, scorers, judges, and tracing. Improves tool selection accuracy, answer quality, reduces costs, fixes incomplete responses.
Designs observability for multi-agent systems with per-agent metrics, aggregate stats, agent cards, and event streams to monitor execution, track costs, log activities, and debug workflows.
Builds, evaluates, and monitors AI agents using Opik: architecture patterns, metrics like hallucination and task completion, production observability, debugging, and best practices.
Share bugs, ideas, or general feedback.
This skill helps you instrument any AI agent — regardless of framework or deployment environment — to send OpenTelemetry telemetry (traces, logs, metrics) to DataRobot. It also creates a shell deployment in DataRobot as the telemetry routing target.
Most common use case: Instrument an existing agent project for DataRobot monitoring
Example: "Instrument my agent in ./my_agent for DataRobot monitoring"
Use this skill when you need to:
| Framework | Detection | OTel Strategy |
|---|---|---|
| Google ADK | google-adk in deps or google.adk in imports | Lazy trace injection via callback (ADK overwrites TracerProvider) |
| LangChain / LangGraph | langchain or langgraph in deps/imports | Auto-instrumentor + standard setup |
| CrewAI | crewai in deps/imports | Auto-instrumentor + standard setup |
| LlamaIndex | llama-index or llama_index in deps/imports | Auto-instrumentor + standard setup |
| PydanticAI | pydantic-ai or pydantic_ai in deps/imports | Standard setup (respects global TracerProvider) |
| Generic Python | None of the above detected | Manual span instrumentation |
Follow these steps in order. Present the plan to the user and wait for approval before executing.
requirements.txt, pyproject.toml, setup.py, poetry.lock, or uv.lock)opentelemetry imports, existing TracerProvider/LoggerProvider/MeterProvider configuration)frameworks/ directory next to this SKILL.md:
frameworks/google-adk.mdframeworks/langchain-langgraph.mdframeworks/crewai.mdframeworks/llamaindex.mdframeworks/pydantic-ai.mdframeworks/generic-python.mdDATAROBOT_API_TOKEN env var is set. If not, ask the user to provide it.DATAROBOT_ENDPOINT env var is set. If not, ask the user (default: https://app.datarobot.com/api/v2).DATAROBOT_OTEL_ENDPOINT automatically: if DATAROBOT_ENDPOINT ends with /api/v2, strip it and append /otel (e.g., https://app.datarobot.com/api/v2 → https://app.datarobot.com/otel).datarobot Python SDK is available. If not, install it: pip install datarobot.Security note: Never echo API tokens or .env file contents into chat transcripts or logs. Use environment variables or CI secrets for credential management. If credentials are accidentally exposed, rotate them immediately.
Tell the user what you detected and present the changes you will make:
dr_otel_config.py, and optionally dr_agent_metrics.py for frameworks with custom metrics)Wait for user approval before executing. If the user has already given explicit consent to implement or deploy, that counts as approval — no need to re-ask.
Add dependencies to the project's dependency file:
opentelemetry-sdkopentelemetry-apiopentelemetry-exporter-otlp-proto-httpGenerate dr_otel_config.py using the generic pattern below, adapted per the framework reference file.
Wire into agent entrypoint: Add import and call to configure_otel() at startup. Follow the framework reference file for specific wiring instructions (auto-instrumentors, callbacks, etc.).
Generate dr_agent_metrics.py if the framework reference file specifies custom metrics callbacks.
Create shell deployment: Run the helper script:
python <skill_scripts_dir>/create_shell_deployment.py \
--name "<project_name> Monitoring" \
--description "OTel telemetry sink for <framework> agent"
The script automatically enables prediction row storage and automatic association ID generation on the deployment.
Report results: Show the deployment ID and a copy-paste env var block for the user's runtime:
export DATAROBOT_API_TOKEN="<token>"
export DATAROBOT_ENTITY_ID="deployment-<id>"
export DATAROBOT_OTEL_ENDPOINT="<otel_endpoint>"
Optionally run the verification script:
DATAROBOT_API_TOKEN=<token> \
DATAROBOT_ENTITY_ID=deployment-<id> \
DATAROBOT_OTEL_ENDPOINT=<endpoint>/otel \
python <skill_scripts_dir>/verify_otel_connection.py
Provide the user with the env vars to set in their deployment environment:
DATAROBOT_API_TOKEN — DataRobot API keyDATAROBOT_ENTITY_ID — deployment-<id> (from shell deployment creation)DATAROBOT_OTEL_ENDPOINT — {DATAROBOT_ENDPOINT}/otelExplain what they'll see in DataRobot:
This is the core configure_otel() function to generate for every project. Framework-specific files layer additional setup on top.
Critical rules:
endpoint= and headers= directly to exporters — NEVER use OTEL_EXPORTER_OTLP_* env vars (some frameworks detect these and create conflicting providers)SimpleSpanProcessor (not Batch) to avoid flush-before-shutdown issuesGenerated dr_otel_config.py template:
"""DataRobot OpenTelemetry configuration.
Configures traces, logs, and metrics export to DataRobot's OTel endpoint.
Call configure_otel() at application startup, before any agent code runs.
Required env vars at runtime:
DATAROBOT_API_TOKEN - DataRobot API key
DATAROBOT_ENTITY_ID - deployment-<deployment_id>
DATAROBOT_OTEL_ENDPOINT - https://<your-instance>.datarobot.com/otel
"""
import logging
import os
from opentelemetry import metrics, trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import SimpleLogRecordProcessor
from opentelemetry.sdk.metrics import Counter, Histogram, MeterProvider, ObservableCounter
from opentelemetry.sdk.metrics.export import (
AggregationTemporality,
PeriodicExportingMetricReader,
)
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
def _build_dr_headers():
"""Build DataRobot authentication headers for OTel exporters."""
api_key = os.environ.get("DATAROBOT_API_TOKEN", "")
entity_id = os.environ.get("DATAROBOT_ENTITY_ID", "")
if not api_key:
logging.warning("DATAROBOT_API_TOKEN not set — OTel export to DataRobot will fail")
if not entity_id:
logging.warning("DATAROBOT_ENTITY_ID not set — OTel export to DataRobot will fail")
return {
"X-DataRobot-Entity-Id": entity_id,
"X-DataRobot-Api-Key": api_key,
}
def _get_endpoint():
"""Get DataRobot OTel endpoint, auto-deriving from DATAROBOT_ENDPOINT if needed."""
endpoint = os.environ.get("DATAROBOT_OTEL_ENDPOINT", "")
if endpoint:
return endpoint.rstrip("/")
# Auto-derive from DATAROBOT_ENDPOINT (e.g. https://app.datarobot.com/api/v2 → .../otel)
api_endpoint = os.environ.get("DATAROBOT_ENDPOINT", "")
if api_endpoint:
base = api_endpoint.rstrip("/")
if base.endswith("/api/v2"):
base = base[: -len("/api/v2")]
return f"{base}/otel"
return ""
def configure_otel():
"""Configure OpenTelemetry to export traces, logs, and metrics to DataRobot.
This function is additive — it adds DataRobot as an additional exporter
alongside any existing OTel setup. It does not replace existing providers.
"""
headers = _build_dr_headers()
endpoint = _get_endpoint()
if not endpoint:
logging.warning("DATAROBOT_OTEL_ENDPOINT not set — skipping OTel configuration")
return
resource = Resource.create()
# --- Traces ---
dr_span_processor = SimpleSpanProcessor(
OTLPSpanExporter(endpoint=f"{endpoint}/v1/traces", headers=headers)
)
existing_provider = trace.get_tracer_provider()
if hasattr(existing_provider, "add_span_processor"):
existing_provider.add_span_processor(dr_span_processor)
else:
provider = TracerProvider(resource=resource)
provider.add_span_processor(dr_span_processor)
trace.set_tracer_provider(provider)
# --- Logs ---
log_exporter = OTLPLogExporter(endpoint=f"{endpoint}/v1/logs", headers=headers)
logger_provider = LoggerProvider(resource=resource)
set_logger_provider(logger_provider)
logger_provider.add_log_record_processor(SimpleLogRecordProcessor(log_exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)
# Custom formatter ensures OTLP log bodies are never empty
# (some libraries emit records with empty getMessage())
handler.setFormatter(logging.Formatter("%(levelname)s %(name)s: %(message)s"))
logging.getLogger().addHandler(handler)
# --- Metrics ---
preferred_temporality = {
Counter: AggregationTemporality.DELTA,
Histogram: AggregationTemporality.DELTA,
ObservableCounter: AggregationTemporality.DELTA,
}
metric_exporter = OTLPMetricExporter(
endpoint=f"{endpoint}/v1/metrics",
headers=headers,
preferred_temporality=preferred_temporality,
)
meter_provider = MeterProvider(
metric_readers=[PeriodicExportingMetricReader(metric_exporter)],
resource=resource,
)
metrics.set_meter_provider(meter_provider)
OTel provider initialization order warning:
Some frameworks override the global TracerProvider at startup (notably Google ADK). When this happens, the standard trace setup above will lose the DataRobot exporter. The framework reference files document which frameworks have this issue and provide alternative patterns (e.g., lazy injection via callbacks). Always check the framework reference file.
Existing OTel setups (e.g., exporters to Jaeger, Datadog, Google Cloud Trace) are preserved when possible — DataRobot is added alongside, not replacing. However, note that OTel has a single global provider per signal. Whoever calls set_tracer_provider() last wins. The additive pattern above avoids calling set_tracer_provider() when a provider already exists, instead adding a processor to the existing one.
DataRobot's tracing UI (Data Exploration > Traces) maps specific span attributes to table columns. Using the correct attribute names is critical for data to appear in the dashboard.
| Tracing Table Column | Span Attribute | Aggregation Rule |
|---|---|---|
| Prompt | gen_ai.prompt | First span with this attribute wins |
| Completion | gen_ai.completion | Last span with this attribute wins |
| Tools | tool_name | Lists all unique values across all spans in the trace |
| Cost | datarobot.moderation.cost | Summed across all spans in the trace |
Important: DataRobot looks for tool_name (underscore), NOT tool.name (dot). Some frameworks (e.g., LangGraph) do not set tool_name by default — you must add it manually as a span attribute inside each tool call.
| Attribute | Description | Example |
|---|---|---|
gen_ai.prompt | User input / prompt text | "Analyze policy XYZ" |
gen_ai.completion | Model output / response | "Policy matched..." |
gen_ai.request.model | Model used for the call | "gpt-4o" |
gen_ai.usage.prompt_tokens | Input token count | 150 |
gen_ai.usage.completion_tokens | Output token count | 320 |
tool_name | Name of tool/function called (required for Tools column) | "search_database" |
tool.parameters | Tool call parameters (JSON string) | '{"query": "..."}' |
datarobot.moderation.cost | Cost of this span (summed for trace total) | 0.0023 |
Creates a shell deployment in DataRobot as a telemetry routing target.
python <scripts_dir>/create_shell_deployment.py \
--name "My Agent Monitoring" \
--description "OTel telemetry sink for my agent"
Requires env vars: DATAROBOT_API_TOKEN, DATAROBOT_ENDPOINT
Returns JSON:
{
"deployment_id": "abc123",
"entity_id": "deployment-abc123",
"otel_endpoint": "https://app.datarobot.com/otel"
}
Sends test telemetry to verify the OTel pipeline is working.
python <scripts_dir>/verify_otel_connection.py
Requires env vars: DATAROBOT_API_TOKEN, DATAROBOT_ENTITY_ID, DATAROBOT_OTEL_ENDPOINT
Returns JSON:
{
"status": "success",
"traces": "sent",
"logs": "sent",
"metrics": "sent"
}
Required for instrumentation (added to user's project):
opentelemetry-sdk
opentelemetry-api
opentelemetry-exporter-otlp-proto-http
Required for shell deployment creation (available in the skill's script environment):
datarobot
configure_otel() before any agent/framework initialization — some frameworks capture the provider at import timeOTEL_EXPORTER_OTLP_* env vars — pass endpoint and headers directly to exporters to avoid conflictsSimpleSpanProcessor over BatchSpanProcessor — avoids flush issues on short-lived processesCommon errors and solutions:
| Error | Cause | Solution |
|---|---|---|
| Traces not appearing in DataRobot | Framework overwrites TracerProvider | Use lazy injection pattern (see framework reference) |
| 401 Unauthorized from OTel endpoint | Invalid API token | Verify DATAROBOT_API_TOKEN is correct |
| 404 from OTel endpoint | Wrong endpoint URL | Ensure DATAROBOT_OTEL_ENDPOINT ends with /otel |
| Metrics not appearing | OTEL_EXPORTER_OTLP_* env vars set | Remove env vars, use direct exporter config |
DATAROBOT_ENTITY_ID format error | Missing deployment- prefix | Must be deployment-<id>, not just <id> |