Help us improve
Share bugs, ideas, or general feedback.
From dspy-api-skills
Sets up LangWatch for DSPy auto-tracing and real-time optimizer progress tracking. Use when you need live optimizer scores, cost, and predictor state during long optimization passes.
npx claudepluginhub lebsral/dspy-programming-not-prompting-lms-skills --plugin dspy-build-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/dspy-api-skills:dspy-langwatchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Guide the user through setting up LangWatch for automatic DSPy tracing and live optimizer progress tracking.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Guide the user through setting up LangWatch for automatic DSPy tracing and live optimizer progress tracking.
LangWatch is an open-source LLMOps platform with two distinct DSPy integrations:
No other observability tool (Langtrace, Phoenix, Weave, MLflow) patches DSPy optimizers to stream live progress.
Use LangWatch when:
Do NOT use LangWatch when:
/dspy-langtrace/dspy-phoenix/dspy-weave/dspy-mlflowpip install langwatch
# Or pin DSPy version compatibility:
pip install langwatch[dspy]
export LANGWATCH_API_KEY="your-key"
git clone https://github.com/langwatch/langwatch.git
cd langwatch
docker compose up -d
Then point your SDK at your local instance:
export LANGWATCH_ENDPOINT="http://localhost:5560"
LangWatch provides a Helm chart for production Kubernetes deployments. See the LangWatch docs for Helm values and configuration.
Use @langwatch.trace() and autotrack_dspy() to automatically capture all DSPy calls during inference.
| Component | Details captured |
|---|---|
| Module calls | Inputs/outputs per dspy.Module.forward() |
| LM calls | Model name, messages, response, token counts |
| Retrievals | Queries, retrieved passages |
| Nested spans | Full call tree with parent-child relationships |
import langwatch
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or "anthropic/claude-sonnet-4-5-20250929", etc.
@langwatch.trace()
def answer_question(question):
langwatch.get_current_trace().autotrack_dspy()
program = dspy.ChainOfThought("question -> answer")
return program(question=question)
result = answer_question("What is DSPy?")
# View traces at app.langwatch.ai (or your self-hosted URL)
import langwatch
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or "anthropic/claude-sonnet-4-5-20250929", etc.
class RAGPipeline(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.answer(context=context, question=question)
pipeline = RAGPipeline()
@langwatch.trace()
def handle_query(question):
langwatch.get_current_trace().autotrack_dspy()
return pipeline(question=question)
result = handle_query("How do refunds work?")
# LangWatch captures:
# - The RAGPipeline call
# - The Retrieve call (query, passages)
# - The ChainOfThought LM call (prompt, response, tokens)
@langwatch.trace()
def handle_query(user_id, question):
trace = langwatch.get_current_trace()
trace.autotrack_dspy()
trace.update(metadata={"user_id": user_id, "environment": "production"})
return pipeline(question=question)
LangWatch patches DSPy optimizer classes to stream live step-by-step progress. This is LangWatch's killer feature — no other tool does this.
| Optimizer | Supported |
|---|---|
dspy.BootstrapFewShot | Yes |
dspy.BootstrapFewShotWithRandomSearch | Yes |
dspy.COPRO | Yes |
dspy.MIPROv2 | Yes |
| Others | Raises ValueError |
import langwatch.dspy
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or "anthropic/claude-sonnet-4-5-20250929", etc.
trainset = [...] # your training examples
def metric(example, prediction, trace=None):
return prediction.answer.strip().lower() == example.answer.strip().lower()
program = dspy.ChainOfThought("question -> answer")
optimizer = dspy.MIPROv2(metric=metric, auto="medium")
# Initialize LangWatch optimizer tracking
langwatch.dspy.init(
experiment="mipro-medium-run1",
optimizer=optimizer,
)
# Run optimization — progress streams to the LangWatch dashboard
optimized = optimizer.compile(program, trainset=trainset)
# Watch live progress at app.langwatch.ai
import langwatch.dspy
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or "anthropic/claude-sonnet-4-5-20250929", etc.
program = dspy.ChainOfThought("question -> answer")
optimizer = dspy.BootstrapFewShot(metric=metric, max_bootstrapped_demos=4)
langwatch.dspy.init(
experiment="bootstrap-4demos",
optimizer=optimizer,
)
optimized = optimizer.compile(program, trainset=trainset)
Run multiple experiments with different names — they appear side-by-side in the LangWatch dashboard:
import langwatch.dspy
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or "anthropic/claude-sonnet-4-5-20250929", etc.
experiments = [
("bootstrap-4", dspy.BootstrapFewShot, {"metric": metric, "max_bootstrapped_demos": 4}),
("bootstrap-8", dspy.BootstrapFewShot, {"metric": metric, "max_bootstrapped_demos": 8}),
("mipro-light", dspy.MIPROv2, {"metric": metric, "auto": "light"}),
("mipro-medium", dspy.MIPROv2, {"metric": metric, "auto": "medium"}),
]
for name, opt_class, kwargs in experiments:
program = dspy.ChainOfThought("question -> answer")
optimizer = opt_class(**kwargs)
langwatch.dspy.init(experiment=name, optimizer=optimizer)
optimized = optimizer.compile(program, trainset=trainset)
| Feature | LangWatch | Langtrace | Phoenix | Weave | MLflow |
|---|---|---|---|---|---|
| DSPy auto-tracing | Yes | Yes (built-in) | Yes (plugin) | No (manual) | Yes (autolog) |
| Optimizer progress | Yes (unique) | No | No | No | No |
| Live scores dashboard | Yes | No | No | No | No |
| Setup effort | 2-3 lines | One line | Two lines + launch | Manual decorators | One line |
| Self-hosted | Yes (Docker, Helm) | Yes (Docker) | Yes | No (cloud only) | Yes |
| Cloud option | Yes (app.langwatch.ai) | Yes (app.langtrace.ai) | Yes (Arize) | Yes (wandb.ai) | Yes (Databricks) |
| Model registry | No | No | No | No | Yes |
| Built-in evals | Basic | Basic | Yes | Basic | Basic |
What do you need?
|
+- Watch optimizer progress live? -> LangWatch (this skill)
+- Easiest auto-tracing setup? -> Langtrace (/dspy-langtrace)
+- Tracing + evals (local)? -> Phoenix (/dspy-phoenix)
+- Tracing + experiment tracking (cloud)? -> Weave (/dspy-weave)
+- Full ML lifecycle + model registry? -> MLflow (/dspy-mlflow)
autotrack_dspy() inside the traced function. The @langwatch.trace() decorator creates the trace context, but DSPy auto-tracking only activates when you call langwatch.get_current_trace().autotrack_dspy() inside the function body. Without it, you get an empty trace with no DSPy spans.autotrack_dspy() outside the @langwatch.trace() function. The autotrack_dspy() call must be inside the decorated function where a trace context exists. Calling it at module level or before the trace starts raises an error because there is no current trace.langwatch.dspy.init() after optimizer.compile(). The init() call must come before compile() — it patches the optimizer to stream progress. If called after, no progress data is captured. Always: create optimizer, call langwatch.dspy.init(experiment=..., optimizer=...), then call optimizer.compile().langwatch.dspy.init(experiment=...) call should use a unique experiment name so runs appear as separate entries in the dashboard. Reusing names overwrites or merges data, making comparison impossible.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/ai-watching-optimization/dspy-langtrace/dspy-phoenix/dspy-weave/dspy-mlflow/ai-tracking-experiments/ai-monitoring/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do