From dspy-agent-skills
Orchestrates full DSPy 3.2.x project workflow: spec task, write program, build data/metric, baseline, GEPA optimize, export, deploy. For non-trivial DSPy builds from scratch.
npx claudepluginhub intertwine/dspy-agent-skills --plugin dspy-agent-skillsThis skill uses the workspace's default tool permissions.
This skill runs the seven-step loop that turns a natural-language task description into an optimized, saved, deployable DSPy program. Every step delegates to a specific skill — invoke them in order.
Writes idiomatic DSPy 3.2.x programs using typed Signatures, dspy.Module subclasses, Predict/ChainOfThought/ReAct/ProgramOfThought. Use for new DSPy projects or refactoring hard-coded prompts.
Jointly optimizes DSPy program instructions and few-shot demos using MIPROv2 Bayesian optimization for maximum performance with 200+ training examples.
Builds type-safe LLM apps in Ruby with DSPy.rb using signatures, modules, agents, tools, and prompt optimization. Useful for predictable AI features, agent systems, and LLM testing.
Share bugs, ideas, or general feedback.
This skill runs the seven-step loop that turns a natural-language task description into an optimized, saved, deployable DSPy program. Every step delegates to a specific skill — invoke them in order.
Rephrase the user's task in one sentence. Identify inputs, outputs, the quality axis that matters, and any constraints (latency, cost, tool access, context size). Pick predictor shape:
| Task shape | Predictor |
|---|---|
| Single-step structured I/O | dspy.Predict / dspy.ChainOfThought |
| Tool use / multi-step | dspy.ReAct |
| Code execution | dspy.ProgramOfThought |
| Long context / codebase | dspy.RLM → dspy-rlm-module |
Write the typed dspy.Signature + dspy.Module subclass per dspy-fundamentals. No hard-coded prompts. Keep predictors named so GEPA can target them.
Build trainset (15–50) and separate valset (15–50) as dspy.Example(...).with_inputs(...). Held-out testset is reported on at the end only. See dspy-evaluation-harness.
Write rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None) returning dspy.Prediction(score=0..1, feedback="natural-language critique"). The feedback is load-bearing — it's what GEPA's reflection LM learns from. A dict with the same fields crashes dspy.Evaluate; only dspy.Prediction aggregates correctly. See dspy-evaluation-harness.
evaluator = dspy.Evaluate(devset=valset, metric=rich_metric,
num_threads=8, display_progress=True,
provide_traceback=True,
save_as_json="runs/baseline.json")
baseline = evaluator(program)
print("Baseline:", baseline.score)
reflection_lm = dspy.LM("openai/gpt-4o", temperature=1.0, max_tokens=8000)
optimizer = dspy.GEPA(
metric=rich_metric,
auto="medium",
reflection_lm=reflection_lm,
candidate_selection_strategy="pareto",
track_stats=True,
track_best_outputs=True,
log_dir="./gepa_logs",
num_threads=8,
seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)
Run auto="light" first as a sanity check; move to auto="medium"/"heavy" for the final run. See dspy-gepa-optimizer.
If you need a deliberate multi-stage compile loop, DSPy 3.2.x also exposes dspy.BetterTogether(metric=..., bootstrap=..., gepa=...) for chaining named optimizers after you have a clean baseline GEPA setup.
optimized.save("artifacts/program.json", save_program=False) # state, portable
# or for full deployment artifact:
optimized.save("artifacts/program_dir/", save_program=True)
Deploy:
dspy.load("artifacts/program_dir/") or reconstruct + .load("program.json").track_usage=True for cost/latency observability.mlflow.dspy.autolog()) or W&B in CI.evaluator against the saved program and fails CI below a threshold."""DSPy end-to-end pipeline — spec → optimize → deploy."""
import dspy
from pathlib import Path
# ----- 1–2. Spec & program (dspy-fundamentals) -----
class MyTask(dspy.Signature):
"""<one-line instruction from the spec>."""
input_field: str = dspy.InputField()
output_field: str = dspy.OutputField()
class MyProgram(dspy.Module):
def __init__(self):
super().__init__()
self.step = dspy.ChainOfThought(MyTask)
def forward(self, **kw):
return self.step(**kw)
# ----- 3. Data (dspy-evaluation-harness) -----
trainset = [...] # list[dspy.Example(...).with_inputs(...)]
valset = [...]
# ----- 4. Rich metric (dspy-evaluation-harness) -----
def rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
score = ... # compute 0..1
feedback = ... # detailed critique
return dspy.Prediction(score=score, feedback=feedback) # NOT a dict
# ----- 5. Baseline -----
dspy.configure(lm=dspy.LM("openai/gpt-4o"), track_usage=True)
evaluator = dspy.Evaluate(devset=valset, metric=rich_metric, num_threads=8,
display_progress=True, provide_traceback=True,
save_as_json="runs/baseline.json")
program = MyProgram()
print("Baseline:", evaluator(program).score)
# ----- 6. GEPA optimize (dspy-gepa-optimizer) -----
optimizer = dspy.GEPA(
metric=rich_metric,
auto="medium",
reflection_lm=dspy.LM("openai/gpt-4o", temperature=1.0, max_tokens=8000),
candidate_selection_strategy="pareto",
track_stats=True, track_best_outputs=True,
log_dir="./gepa_logs", num_threads=8, seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)
# ----- 7. Export (dspy-fundamentals) -----
Path("artifacts").mkdir(exist_ok=True)
optimized.save("artifacts/program.json", save_program=False)
module._compiled = True before multi-stage re-compilation.