Instrument LLM API calls with proper spans, tokens, and latency
Automatically instruments LLM API calls to track model, latency, tokens, and costs. Claude uses this whenever calling LLM APIs to capture performance metrics without logging sensitive prompt/response content.
/plugin marketplace add nexus-labs-automation/agent-observability/plugin install nexus-labs-automation-agent-observability@nexus-labs-automation/agent-observabilityThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Instrument LLM API calls to track latency, tokens, costs, and errors.
Every LLM call should capture:
# Required (P0)
span.set_attribute("llm.model", "claude-3-opus-20240229")
span.set_attribute("llm.provider", "anthropic")
span.set_attribute("llm.latency_ms", 2340)
span.set_attribute("llm.success", True)
# Token tracking (P1)
span.set_attribute("llm.tokens.input", 1500)
span.set_attribute("llm.tokens.output", 350)
span.set_attribute("llm.tokens.total", 1850)
# Cost (P1)
span.set_attribute("llm.cost_usd", 0.025)
# Configuration (P2)
span.set_attribute("llm.temperature", 0.7)
span.set_attribute("llm.max_tokens", 4096)
span.set_attribute("llm.stop_reason", "end_turn")
# Error context (when applicable)
span.set_attribute("llm.error.type", "rate_limit")
span.set_attribute("llm.error.message", "Rate limit exceeded")
span.set_attribute("llm.retry_count", 2)
Never log full prompts/responses:
# BAD - PII risk, storage explosion
span.set_attribute("llm.prompt", messages)
span.set_attribute("llm.response", completion.content)
# GOOD - Safe metadata
span.set_attribute("llm.prompt.message_count", len(messages))
span.set_attribute("llm.prompt.system_length", len(system_prompt))
span.set_attribute("llm.response.length", len(completion.content))
For streaming responses:
span.set_attribute("llm.streaming", True)
span.set_attribute("llm.ttft_ms", 145) # Time to first token
span.set_attribute("llm.chunks", 47) # Number of chunks
Calculate cost from tokens and model pricing:
PRICING = {
"claude-3-opus": {"input": 15.00, "output": 75.00}, # per 1M tokens
"claude-3-sonnet": {"input": 3.00, "output": 15.00},
"claude-3-haiku": {"input": 0.25, "output": 1.25},
"gpt-4-turbo": {"input": 10.00, "output": 30.00},
"gpt-4o": {"input": 5.00, "output": 15.00},
}
def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
pricing = PRICING.get(model, {"input": 0, "output": 0})
input_cost = (input_tokens / 1_000_000) * pricing["input"]
output_cost = (output_tokens / 1_000_000) * pricing["output"]
return round(input_cost + output_cost, 6)
from langfuse.callback import CallbackHandler
handler = CallbackHandler()
chain.invoke(input, config={"callbacks": [handler]})
from langfuse.decorators import observe
@observe(as_type="generation")
def call_claude(messages):
response = client.messages.create(...)
return response
from langfuse.openai import openai
# Automatic instrumentation
client = openai.OpenAI()
Capture errors with context:
try:
response = client.messages.create(...)
except RateLimitError as e:
span.set_attribute("llm.error.type", "rate_limit")
span.set_attribute("llm.error.retry_after", e.retry_after)
raise
except APIError as e:
span.set_attribute("llm.error.type", "api_error")
span.set_attribute("llm.error.status", e.status_code)
raise
See references/anti-patterns/llm-tracing.md:
token-cost-tracking - Detailed cost attributionerror-retry-tracking - Error handling patternsCreating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.