From chipdev-method
Guides the construction of a functional behavior model that decouples timing but preserves architectural state for RTL difftest — what to keep, what to drop, where to embed probes, how to package the cross-language interface, and how to coordinate memory with the DUT. Activate when the user explicitly invokes /chipdev-method:build-behavior-model, or asks "什么是行为模型", "behavior model 边界", "difftest 用什么 ref 模型", "怎么做 functional simulator", or designs the reference half of a difftest setup.
npx claudepluginhub curryfromuestc/dev-guide --plugin chipdev-methodThis skill uses the workspace's default tool permissions.
Use this skill when the user is building the **functional reference model**
Monitors deployed URLs for regressions after deploys, merges, or upgrades by checking HTTP status, console errors, network failures, performance (LCP/CLS/INP), content, and API health.
Share bugs, ideas, or general feedback.
Use this skill when the user is building the functional reference model that a difftest setup will compare against the RTL DUT. Do not autoload.
A behavior model is not a slow performance model with cycles dropped. It is a different artifact with a different contract: it captures architectural state evolution, drops everything timing-coupled, and exposes a curated set of probes for difftest.
If the user wants a cycle-accurate model for microarchitecture exploration,
redirect to build-perf-model.
When triggered:
align-and-difftest once the behavior model is implementable.If the user has not yet defined the alignment contract in
define-contracts, push them back. Behavior model implementation without
agreed sampling points means the model will be unusable for difftest.
[abstract]| Dimension | Behavior model | Performance model |
|---|---|---|
| Captures | Architectural state evolution | Cycle-by-cycle microarchitecture |
| Time model | Event-ordered or sequential, no clock | Cycle-accurate clock |
| Speed target | 10×–100× hardware | 0.001× hardware |
| Used by | Software, driver, difftest reference | Microarch exploration, perf reports |
| Memory model | Often shares memory with DUT | Owns its own memory |
| Async events | RTL forwards them in | Drives them on its own |
Do not blur these roles. A "fast" performance model and a "slow" behavior model are not on the same axis — they have different state shapes.
[abstract]mcycle, perf counters that depend on micro-arch).[abstract]For an ISA-style design, the typical list is:
| Element | Required? | Notes |
|---|---|---|
| General-purpose registers | Yes | Width, count from spec. |
| Floating-point / vector registers | If applicable | ULP precision is a separate tier. |
| Program counter | Yes | One per hart / lane. |
| Control / status registers | Yes | Enumerate the architectural ones; skip implementation-defined mcycle-class. |
| Memory | Yes | Share or duplicate — see "Memory coordination". |
| TLB / page tables | If applicable | Model the architectural translation, not the implementation. |
| Interrupt pending state | Yes | One bit per architectural interrupt source. |
| Debug-mode state | If applicable | dpc, dcsr, etc. |
| Privilege level | If applicable | Current and stack. |
For an accelerator-style design (GPU / NPU / DSA), the list shifts:
| Element | Notes |
|---|---|
| Command queue state | Head, tail, sequence numbers. |
| Per-task / per-kernel state | Whatever the spec defines as architecturally observable. |
| Memory views | Address ranges visible to architectural code. |
| Output buffers | The "answer" the DUT must produce. |
For accelerators, the alignment contract is usually transaction-level
(see align-and-difftest), not retire-level.
[abstract]The behavior model exists to be observed by difftest. Probes are not optional. They must be:
define-contracts). The probe header is a fourth
emission target alongside C++ struct, SV interface, and difftest
bridge.Concretely, every architectural state element should have:
probe_<element>(value) function that the DUT calls (via DPI-C or
shared memory) at the architectural commit moment.query_<element>() function the ref can call when needed.Pre-embed probes in the behavior model from day one. Bolting them on later forces re-architecting.
[industry-pattern]| Mechanism | When to use | Notes |
|---|---|---|
| DPI-C with packed structs | Default. Both sides in same process. | Zero-copy across SystemVerilog ↔ C; no IPC; debuggable. |
| VPI / PLI | DPI-C unavailable, legacy tooling only. | Slower; harder to tune. |
| Shared memory ring buffer | Ref must run as a separate process (e.g., the ref is QEMU or a third-party binary). | Higher overhead but flexible. |
| TCP/Unix socket | Distributed setup, mostly debugging. | Slowest, easiest to instrument. |
Default recommendation: DPI-C + packed struct. Package the behavior
model as a shared library (.so) the testbench links to. This matches
what the major open-source RISC-V difftest setups do
[case: OpenXiangShan/difftest, lowRISC Ibex cosim].
[industry-pattern]A common false-positive source: ref and DUT each maintain their own memory copy, drift, then disagree on a load value that came from an implementation-defined region (UART, MMIO, weakly-ordered shared region).
The fix: DUT loads, then forwards the loaded value to the ref. The ref does not independently model that load. Two operational forms:
ref_set_load_result(addr, value) before instructing the
ref to commit the load instruction. Used by NEMU + OpenXiangShan/difftest.mem_query(addr)
function backed by the DUT's view of memory. Used when the ref runs ahead
of the DUT.Either way, the principle is: the ref should not have an independent memory
model that competes with the DUT's. [case: NEMU DiffMem, ImperasDV with RVVI mem-feed]
[industry-pattern]Interrupts, debug breakpoints, and external events arrive asynchronously to the DUT. If the ref also generates them on its own internal clock, alignment is impossible.
Pattern: RTL detects the event, then forwards a sync signal to the ref. The ref injects the event at the next architectural commit boundary, so both sides see the same architectural-time injection point.
external IRQ
↓
RTL DUT ──────[detect, latch at commit]──────→ ref: commit_with_irq()
[case: lowRISC Ibex cosim, ImperasDV / RVVI]
[abstract]A behavior model that is too slow becomes the bottleneck of difftest. Targets are typically 100×–1000× faster than the cycle-accurate model.
.so, not a separate process. Avoid IPC.struct with PC, registers,
CSR — no virtual functions on the hot path.align-and-difftest).[abstract]mcycle-style counter that
reflects "ticks of the simulator" leaks into difftest as architectural
state and causes mismatches that aren't real. Either omit, or stub to
match the DUT's reported value.choose-artifact — when to build a behavior model at all.define-contracts — the state contract and observability hooks the
behavior model embodies.align-and-difftest — how the behavior model gets compared against
RTL in practice.build-perf-model — the sister artifact for cycle-accurate work.references/case-spike.md — Spike as a RISC-V reference model.references/case-nemu.md — NEMU + DiffMem design.references/case-imperas-rvvi.md — ImperasDV + RVVI configurable
state-mask approach.