Skill

dspy-advanced-workflow

Orchestrates full DSPy 3.2.x project workflow: spec task, write program, build data/metric, baseline, GEPA optimize, export, deploy. For non-trivial DSPy builds from scratch.

Python

OpenAI

ai-ml

npx claudepluginhub intertwine/dspy-agent-skills --plugin dspy-agent-skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

This skill runs the seven-step loop that turns a natural-language task description into an optimized, saved, deployable DSPy program. Every step delegates to a specific skill — invoke them in order.

Supporting Assets

example_pipeline.py

SKILL.md

Similar Skills

dspy-fundamentals

206

Writes idiomatic DSPy 3.2.x programs using typed Signatures, dspy.Module subclasses, Predict/ChainOfThought/ReAct/ProgramOfThought. Use for new DSPy projects or refactoring hard-coded prompts.

2 files

dspy-agent-skills

dspy-miprov2-optimizer

Jointly optimizes DSPy program instructions and few-shot demos using MIPROv2 Bayesian optimization for maximum performance with 200+ training examples.

4 tools

dspy-skills

dspy-ruby

Builds type-safe LLM apps in Ruby with DSPy.rb using signatures, modules, agents, tools, and prompt optimization. Useful for predictable AI features, agent systems, and LLM testing.

8 files

compound-engineering

Stats

Stars206

Forks15

Last CommitApr 21, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

Rephrase the user's task in one sentence. Identify inputs, outputs, the quality axis that matters, and any constraints (latency, cost, tool access, context size). Pick predictor shape:

Task shape	Predictor
Single-step structured I/O	`dspy.Predict` / `dspy.ChainOfThought`
Tool use / multi-step	`dspy.ReAct`
Code execution	`dspy.ProgramOfThought`
Long context / codebase	`dspy.RLM` → `dspy-rlm-module`

2. Program

Write the typed dspy.Signature + dspy.Module subclass per dspy-fundamentals. No hard-coded prompts. Keep predictors named so GEPA can target them.

3. Data

Build trainset (15–50) and separate valset (15–50) as dspy.Example(...).with_inputs(...). Held-out testset is reported on at the end only. See dspy-evaluation-harness.

4. Rich metric

Write rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None) returning dspy.Prediction(score=0..1, feedback="natural-language critique"). The feedback is load-bearing — it's what GEPA's reflection LM learns from. A dict with the same fields crashes dspy.Evaluate; only dspy.Prediction aggregates correctly. See dspy-evaluation-harness.

5. Baseline

evaluator = dspy.Evaluate(devset=valset, metric=rich_metric,
                          num_threads=8, display_progress=True,
                          provide_traceback=True,
                          save_as_json="runs/baseline.json")
baseline = evaluator(program)
print("Baseline:", baseline.score)

6. GEPA optimize

reflection_lm = dspy.LM("openai/gpt-4o", temperature=1.0, max_tokens=8000)
optimizer = dspy.GEPA(
    metric=rich_metric,
    auto="medium",
    reflection_lm=reflection_lm,
    candidate_selection_strategy="pareto",
    track_stats=True,
    track_best_outputs=True,
    log_dir="./gepa_logs",
    num_threads=8,
    seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)

Run auto="light" first as a sanity check; move to auto="medium"/"heavy" for the final run. See dspy-gepa-optimizer.

If you need a deliberate multi-stage compile loop, DSPy 3.2.x also exposes dspy.BetterTogether(metric=..., bootstrap=..., gepa=...) for chaining named optimizers after you have a clean baseline GEPA setup.

7. Export & deploy

optimized.save("artifacts/program.json", save_program=False)     # state, portable
# or for full deployment artifact:
optimized.save("artifacts/program_dir/", save_program=True)

Deploy:

Load with dspy.load("artifacts/program_dir/") or reconstruct + .load("program.json").
Wrap in FastAPI/CLI.
Enable track_usage=True for cost/latency observability.
Log with MLflow (mlflow.dspy.autolog()) or W&B in CI.
Keep an offline regression test that runs the evaluator against the saved program and fails CI below a threshold.

Full orchestration template

"""DSPy end-to-end pipeline — spec → optimize → deploy."""

import dspy
from pathlib import Path

# ----- 1–2. Spec & program (dspy-fundamentals) -----
class MyTask(dspy.Signature):
    """<one-line instruction from the spec>."""
    input_field: str = dspy.InputField()
    output_field: str = dspy.OutputField()

class MyProgram(dspy.Module):
    def __init__(self):
        super().__init__()
        self.step = dspy.ChainOfThought(MyTask)
    def forward(self, **kw):
        return self.step(**kw)

# ----- 3. Data (dspy-evaluation-harness) -----
trainset = [...]   # list[dspy.Example(...).with_inputs(...)]
valset   = [...]

# ----- 4. Rich metric (dspy-evaluation-harness) -----
def rich_metric(gold, pred, trace=None, pred_name=None, pred_trace=None):
    score = ...          # compute 0..1
    feedback = ...       # detailed critique
    return dspy.Prediction(score=score, feedback=feedback)  # NOT a dict

# ----- 5. Baseline -----
dspy.configure(lm=dspy.LM("openai/gpt-4o"), track_usage=True)
evaluator = dspy.Evaluate(devset=valset, metric=rich_metric, num_threads=8,
                          display_progress=True, provide_traceback=True,
                          save_as_json="runs/baseline.json")
program = MyProgram()
print("Baseline:", evaluator(program).score)

# ----- 6. GEPA optimize (dspy-gepa-optimizer) -----
optimizer = dspy.GEPA(
    metric=rich_metric,
    auto="medium",
    reflection_lm=dspy.LM("openai/gpt-4o", temperature=1.0, max_tokens=8000),
    candidate_selection_strategy="pareto",
    track_stats=True, track_best_outputs=True,
    log_dir="./gepa_logs", num_threads=8, seed=0,
)
optimized = optimizer.compile(student=program, trainset=trainset, valset=valset)
print("Optimized:", evaluator(optimized).score)

# ----- 7. Export (dspy-fundamentals) -----
Path("artifacts").mkdir(exist_ok=True)
optimized.save("artifacts/program.json", save_program=False)

Guardrails

Never skip step 3 (rich metric). GEPA without feedback ≈ random search.
Always baseline before optimizing — no baseline, no claim.
Save both pre- and post-optimization metrics to JSON for auditability.
If held-out test score drops post-optimization, your valset is too narrow. Expand valset and re-run.
Freeze optimized program with module._compiled = True before multi-stage re-compilation.

dspy-advanced-workflow

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

dspy-advanced-workflow

Tool Access

Preview

Supporting Assets

SKILL.md

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py

Similar Skills

Help us improve

DSPy Advanced Workflow (2026)

The seven steps

1. Spec

2. Program

3. Data

4. Rich metric

5. Baseline

6. GEPA optimize

7. Export & deploy

Full orchestration template

Guardrails

Runnable scaffold → example_pipeline.py