Skill

align-and-difftest

Guides behavior model ↔ RTL equivalence checking — sampling point selection (commit/retire vs transaction-edge vs cycle), drive direction (DUT pushes ref), DPI-C probe interface, async event alignment, snapshot-based debugging, and DSL-driven probe generation. Activate when the user explicitly invokes /chipdev-method:align-and-difftest, or asks "怎么 diff cmodel 和 RTL", "behavior 对不上 RTL", "difftest 怎么搭", "what should I sample", "RVVI vs DPI", or "snapshot for difftest", or sets up a difftest infrastructure.

npx claudepluginhub curryfromuestc/dev-guide --plugin chipdev-method

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Use this skill when the user is establishing — or debugging — equivalence

Supporting Assets

references/case-ibex-cosim.mdreferences/case-libsystemctlm-soc.mdreferences/case-rvvi.mdreferences/case-xiangshan-difftest.md

SKILL.md

Similar Skills

canary-watch

179.4k

Monitors deployed URLs for regressions after deploys, merges, or upgrades by checking HTTP status, console errors, network failures, performance (LCP/CLS/INP), content, and API health.

ecc

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitMay 5, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Align and Difftest

Use this skill when the user is establishing — or debugging — equivalence checking between a behavior model and an RTL DUT. Do not autoload.

The goal is to give the user a small set of strong defaults that match how the most successful open-source difftest projects work, and to help them avoid the failure modes that absorb teams' Phase 4–5 calendars.

If the user has not yet built a behavior model, redirect to build-behavior-model. If they don't yet have an alignment contract, redirect to define-contracts.

How to use this skill in a response

When triggered:

Identify the alignment scenario — ISA-style commit alignment, accelerator-style transaction alignment, or something narrower (single-module unit-level diff).
Recommend the sampling point, drive direction, interface mechanism, and debug approach.
Surface the failure mode their setup is most likely to hit.
If they're already debugging a mismatch, walk them through the triage approach in "Debugging a mismatch" below.

Difftest is about closing degrees of freedom. Every "we'll figure that out later" leaves a degree of freedom that becomes a Phase 4 outage. Be specific.

The four-decision frame `[abstract]`

Every difftest setup answers four questions. Get them all decided up front.

Decision	Default for ISA-style	Default for accelerator-style
Sampling point	Commit / retire boundary	Transaction edge (input + output)
Drive direction	DUT pushes ref	DUT pushes ref
Cross-language	DPI-C with packed structs	DPI-C, or shared-memory ring if ref is a separate process
State alignment	DUT-spec (ref accepts the DUT's view of memory)	DUT-spec

The rest of this skill explains why these defaults dominate, and what "DUT-pushes-ref" actually entails operationally.

Sampling point selection `[industry-pattern]`

Three options. They are not interchangeable.

Commit / retire-driven

Compare ref and DUT at the architectural commit boundary — when the DUT's RTL retires an instruction (or transaction-equivalent). Skip transient pipeline state.

Default for ISA designs.
Used by major open-source RISC-V difftest setups [case: OpenXiangShan/difftest, lowRISC Ibex cosim, ImperasDV / RVVI].
Matches the architectural state contract (build-behavior-model).
Requires the RTL to expose retirement events.

Transaction-edge-driven

Compare ref and DUT at input transaction edges and output transaction edges. The ref is invoked once per input; output is compared on the next output edge.

Default for accelerators (GPUs, NPUs, DSAs) where there's no conventional retirement boundary.
The "transaction" is whatever the alignment contract names — a command packet, a tile, a tensor block, a memory request.
The behavior model can run "until output" without simulating cycles.

Cycle-driven

Compare every cycle.

Only appropriate when the DUT has no out-of-order issue, no speculative execution, no pipeline collapse — meaning, simple in-order datapath blocks.
Requires the ref to be cycle-accurate, which violates the build-behavior-model contract. Usually a sign you're using the wrong artifact as ref.

Recommendation: don't pick cycle-driven unless you've already exhausted commit/retire and transaction approaches.

Drive direction — DUT pushes ref `[industry-pattern]`

Three patterns. The first is the strong default.

DUT-pushes-ref (recommended)

The DUT retires an instruction (or completes a transaction). At that moment, it calls a DPI-C function that:

Forwards the architectural state to the ref.
Asks the ref to step one instruction (or process one transaction).
Compares ref state to DUT state.
On mismatch, raises an alert and (if configured) snapshots.

Used by [case: OpenXiangShan/difftest, lowRISC Ibex cosim, ImperasDV/RVVI].

Pros: the DUT controls timing; the ref is purely passive; performance is bound by the DUT, not by ref↔DUT IPC.

Cons: the ref must support being driven externally — most modern behavior models do.

Ref-pushes-DUT (trace-replay)

The ref runs first, dumps a trace, the DUT runs second, and trace is diffed offline.

Used historically [case: Chipyard / Rocket Chip Spike commit-log diff].
Pros: simplest possible setup; no co-process work.
Cons: cannot inject async events (every interrupt has to be in the trace, deterministically); replay is brittle; debugging is offline.
Use only for regression baselining, not for development flow.

Lockstep

Both sides advance together; each side confirms the other after every step.

Standardized by [industry-pattern: RVVI].
Used as a configuration of DPI-C-based setups.
Pros: clear semantics; tooling-vendor-friendly.
Cons: tightest coupling, slowest if implemented naively.

Cross-language interface `[industry-pattern]`

Mechanism	Use when	Notes
DPI-C with packed struct	Default. Same process.	Zero-copy; native debug; SVA-compatible.
Shared memory ring buffer	Ref must be a separate process (e.g., it's QEMU, or another tool).	Higher latency; needs careful synchronization.
Unix / TCP socket	Distributed setup or debug instrumentation.	Slowest; only for tooling.
VPI / PLI	DPI-C unavailable.	Slow; legacy.

Default: DPI-C, with the probe header generated from the same DSL that generates the interface (see define-contracts).

// Generated by the DSL toolchain — both sides include this header.
extern void diff_commit_register(uint32_t reg_id, uint64_t value);
extern void diff_commit_pc(uint64_t pc);
extern void diff_step_until_commit(void);
extern int  diff_compare(diff_state_t* dut_state);

The signatures are not invented at integration time — they are derived from the architectural state checklist established by build-behavior-model.

State alignment strategy `[abstract]`

Two questions of policy.

State mask configuration

Don't compare everything from day one. Stage in:

PC only — catches control-flow divergence; usually enough to bring up the loop.
PC + GPR — catches data-path bugs.
PC + GPR + CSR — catches privilege / interrupt bugs.
PC + GPR + CSR + memory — catches load/store divergence.
Floating-point, vector, debug-mode, etc. — last.

Tooling note: [industry-pattern: RVVI] standardizes this masking. Roll your own with a config flag if you don't use RVVI.

Memory: feed the ref from the DUT

Don't let the ref have an independent memory model. Either:

The DUT's load result is forwarded to the ref before the ref commits the load (push). [case: NEMU DiffMem]
The ref consults the DUT's memory view via a mem_query() callback (pull).

Either avoids the false-positive class where ref and DUT disagree on implementation-defined regions (UART, timers, weakly-ordered shared memory).

Async event alignment `[industry-pattern]`

Pattern: RTL detects, latches at architectural boundary, forwards to ref.

async IRQ ──→ DUT detects ──→ DUT latches at next commit boundary
                                          │
                                          ▼
                              DUT calls diff_inject_irq(irq_id)
                                          │
                                          ▼
                              ref accepts at the same architectural
                              boundary; both step together.

[case: lowRISC Ibex cosim, ImperasDV / RVVI]

Never let the ref generate its own async events. The ref's clock and the DUT's clock have no shared definition.

Snapshot-based debugging `[industry-pattern]`

Long simulation runs (multi-day) cannot afford to "catch the mismatch with the wave open from the start". The pattern that solves this:

Periodically fork() the simulator process — every N cycles or every M instructions.
The forked process is suspended; it holds the snapshot.
When difftest detects a mismatch, the most recent snapshot is resumed with verbose tracing and waveform dumping enabled.
The mismatch reproduces under instrumentation.

Public reference: LightSSS in OpenXiangShan/difftest is the canonical implementation [case: OpenXiangShan/difftest LightSSS].

This is non-optional for projects with regression cycles measured in hours.

Probe generation from the DSL `[abstract]`

A pattern only available if the project uses an interface DSL (see define-contracts):

The DSL emits, alongside the C++ struct and SystemVerilog interface, a third artifact: the difftest probe header. Both ref and DUT pull from the same generated definitions, so signal sets cannot drift.

This is the operational form of "Generation over hand-writing" (invariant 1) applied to difftest. The cost of doing this from day one is ~one parser back-end. The cost of not doing this is reconciling probe mismatches across the project's lifetime.

Debugging a mismatch `[abstract]`

Triage in this order:

Confirm reproduction. Same seed, same test, mismatch is deterministic. If not deterministic, the bug is more likely in the testbench than in either ref or DUT.
Narrow the masked state. Drop everything except PC. Does the mismatch still happen? Then it's control flow. Add GPR. Still happens? Data path.
Bisect by snapshot. Roll back to the most recent snapshot; binary search by snapshotting more frequently.
Inspect the contributing transaction. What was the last input? What architectural state changed?
Check the four-decision frame. Is this a memory-feed issue? An async-event timing issue? A probe-set drift issue?
If still stuck, route to diagnose. Codex is well-suited to a structured triage of a difftest mismatch.

Common failure modes `[abstract]`

Independent memory model in the ref. Already covered. Always feed from DUT.
Cycle-driven sampling on a speculative DUT. Compares transient pipeline state; everything looks like a mismatch. Use commit boundaries.
Ref injects its own interrupts. Never agrees with DUT.
Probe set drift between ref and DUT. Add a field to the ref's state, forget to add it to the DUT's probe, mismatch on noise. Generate from the DSL.
No state mask configuration. Bring-up has to compare everything, every divergence is fatal. Stage the comparison.
Snapshots are too rare. Mismatch happens at hour 4; rolling back to hour 0 means re-running 4 hours under tracing. Snapshot every ~30 minutes.
Difftest run as a Phase 4 task. By then the ref and DUT have drifted, observability isn't there, and probes have to be retrofitted. Set up difftest in Phase 1.
Behavior model is too slow. Difftest becomes the bottleneck. Profile the ref; pack as .so; batch commits.

align-and-difftest

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

align-and-difftest

Tool Access

Preview

Supporting Assets

SKILL.md

Align and Difftest

How to use this skill in a response

The four-decision frame [abstract]

Sampling point selection [industry-pattern]

Commit / retire-driven

Transaction-edge-driven

Cycle-driven

Drive direction — DUT pushes ref [industry-pattern]

DUT-pushes-ref (recommended)

Ref-pushes-DUT (trace-replay)

Lockstep

Cross-language interface [industry-pattern]

State alignment strategy [abstract]

State mask configuration

Memory: feed the ref from the DUT

Async event alignment [industry-pattern]

Snapshot-based debugging [industry-pattern]

Probe generation from the DSL [abstract]

Debugging a mismatch [abstract]

Common failure modes [abstract]

See also

Similar Skills

Help us improve

Align and Difftest

How to use this skill in a response

The four-decision frame [abstract]

Sampling point selection [industry-pattern]

Commit / retire-driven

Transaction-edge-driven

Cycle-driven

Drive direction — DUT pushes ref [industry-pattern]

DUT-pushes-ref (recommended)

Ref-pushes-DUT (trace-replay)

Lockstep

Cross-language interface [industry-pattern]

State alignment strategy [abstract]

State mask configuration

Memory: feed the ref from the DUT

Async event alignment [industry-pattern]

Snapshot-based debugging [industry-pattern]

Probe generation from the DSL [abstract]

Debugging a mismatch [abstract]

Common failure modes [abstract]

See also

The four-decision frame `[abstract]`

Sampling point selection `[industry-pattern]`

Drive direction — DUT pushes ref `[industry-pattern]`

Cross-language interface `[industry-pattern]`

State alignment strategy `[abstract]`

Async event alignment `[industry-pattern]`

Snapshot-based debugging `[industry-pattern]`

Probe generation from the DSL `[abstract]`

Debugging a mismatch `[abstract]`

Common failure modes `[abstract]`

The four-decision frame `[abstract]`

Sampling point selection `[industry-pattern]`

Drive direction — DUT pushes ref `[industry-pattern]`

Cross-language interface `[industry-pattern]`

State alignment strategy `[abstract]`

Async event alignment `[industry-pattern]`

Snapshot-based debugging `[industry-pattern]`

Probe generation from the DSL `[abstract]`

Debugging a mismatch `[abstract]`

Common failure modes `[abstract]`