Help us improve
Share bugs, ideas, or general feedback.
From rtl-agent-team
Enforces lowRISC + project-specific SystemVerilog coding conventions for .sv/.v files. Covers naming, module structure, timing, and synthesis rules.
npx claudepluginhub babyworm/rtl-agent-team --plugin rtl-agent-teamHow this skill is triggered — by the user, by Claude, or both
Slash command
/rtl-agent-team:systemverilogThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<Purpose>
Enforces SystemC/TLM-2.0 coding standards for BFM and Reference Model development, covering naming conventions, AT/LT patterns, and testbench integration.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Target standard: IEEE 1800-2009 for synthesizable RTL.
interface/modport is 2009 standard but iverilog unsupported — do NOT generate in RTL (see §4.3)-g2012 for parser compatibility (2012 parser handles 2009 code)Baseline: lowRISC SystemVerilog Coding Style Guide (https://github.com/lowRISC/style-guides/blob/master/VerilogCodingStyle.md) Project overrides take precedence over default lowRISC rules.
<Use_When>
<Do_Not_Use_When>
systemc skillrtl-p5s-func-verify skill<Why_This_Exists> Consistent coding standards ensure lint pass rate, synthesis quality, and team readability all at once. Although based on lowRISC, this project has its own overrides for port naming, clock/reset rules, etc., so they are managed as a separate skill to ensure all SV-generating agents reference the same rules. </Why_This_Exists>
<Execution_Policy>
templates/module-template.sv as the starting point for new modulesexamples/good-vs-bad.sv for correct/incorrect pattern examplesreferences/coding-style-guide.md for detailed project overrides vs. original lowRISC rules
</Execution_Policy>IMPORTANT — The following 3 rules differ from the lowRISC guide and must always be applied.
i_, Output: o_, Bidirectional: io_ — mandatoryi_data, o_valid, io_sda (NOT data_i, valid_o)i_ prefix (clk, sys_clk, rst_n, sys_rst_n)_i, _o, _io) is forbiddenclk (default) or {domain}_clk (multi-clock)sys_clk, pixel_clk, axi_clkclk_i, NOT suffixrst_n (default) or {domain}_rst_n (multi-domain)rst_n, sys_rst_n, pixel_rst_n (NOT rst_ni)UpperCamelCase for Parameters and Enum valuesALL_CAPS (DATA_WIDTH, NOT DataWidth)L_ prefix + ALL_CAPS (L_ADDR_BITS, NOT AddrBits)ALL_CAPS (ST_IDLE, NOT StIdle)IMPORTANT — CamelCase is completely prohibited. All identifiers must use only
snake_caseorALL_CAPS. Detailed rules: seereferences/coding-style-guide.md.
| Target | Rule | Example |
|---|---|---|
| Module | snake_case | axi_lite_slave |
| Parameter (externally configurable) | ALL_CAPS | DATA_WIDTH, DEPTH |
| Local parameter (internal only) | L_ prefix + ALL_CAPS | L_ADDR_BITS, L_CNT_MAX |
| Type (typedef) | snake_case_t suffix | state_t, bus_req_t |
| Enum type (typedef enum) | snake_case_e suffix | state_e, cmd_type_e |
| Enum values | ALL_CAPS | ST_IDLE, WAIT_RESP |
| `define macros | ALL_CAPS | MAX_DEPTH, ASSERT_ON |
| Instances | u_ prefix | u_fifo, u_arbiter |
| Generate blocks | gen_ prefix | gen_pipeline_stage |
| Signals (internal) | snake_case | write_enable, addr_valid |
UVM Exception: UVM class member handles use
m_prefix per industry convention (m_driver,m_monitor).u_prefix applies to RTL module instances only.
| Forbidden (CamelCase) | Correct Form |
|---|---|
parameter int Width = 8 | parameter int unsigned WIDTH = 8 |
localparam AddrBits = $clog2(Depth) | localparam L_ADDR_BITS = $clog2(DEPTH) |
StIdle, StProcess | ST_IDLE, ST_PROCESS |
Any UpperCamelCase | ALL_CAPS or snake_case |
| Type | Pattern | Example |
|---|---|---|
| Module | module_name.sv | axi_lite_slave.sv |
| Package | module_name_pkg.sv | cabac_pkg.sv |
| Interface | module_name_if.sv | axi_if.sv (iverilog unsupported — do NOT generate, §4.3) |
| Testbench | tb_module_name.sv | tb_axi_lite_slave.sv |
| SVA bind | sva_module_name.sv | sva_axi_lite_slave.sv |
One module per file, filename matches module name.
logic (using reg/wire is forbidden)always_ff for sequential (non-blocking <=)always_comb for combinational (blocking =)typedef enum / typedef struct packed_pkg.sv)input logic / output logic (ANSI style)parameter or localparamreg, wire keywords is forbiddenalways_latch is forbidden (except for explicit latches; generally prohibited)initial blocks in synthesizable code are forbiddendefault is mandatory in all case statements#delay in synthesizable code is forbiddenalways_ff Rules (Verification TB Caveat)VCS enforces IEEE 1800 always_ff semantics strictly — a variable driven by always_ff must NOT also be driven by initial, always_comb, or task blocks (ICPD error). Verilator and iverilog are lenient on this.
RTL code: No issue — RTL should never mix always_ff with initial for the same signal.
Verification TB code (coverage counters, debug registers): If a testbench variable needs both sequential update (posedge clk) and procedural initialization (initial or task), use always @(posedge clk) instead of always_ff:
// BAD — VCS ICPD error: cov_cnt driven by always_ff AND initial
always_ff @(posedge clk) cov_cnt <= cov_cnt + 1;
initial cov_cnt = 0;
// GOOD — always @(posedge) allows multiple drivers in TB
always @(posedge clk) cov_cnt <= cov_cnt + 1;
initial cov_cnt = 0;
This applies to testbench only — synthesizable RTL must always use always_ff.
slang detection: slang -Weverything catches this same always_ff multi-driver violation
at lint time (RTL mode). For TB files, use slang --allow-dup-initial-drivers to permit the
initial + always_ff pattern. The run_lint.sh --tool slang wrapper auto-detects RTL vs TB
based on file paths.
This restriction applies to synthesizable RTL code only. Verification TBs may use
interfaceif the target simulator supports it.
interface / modport — unsupported by iverilog. Use port lists insteadstruct / union — unsupported by iverilog. Use individual signals or packed versionstypedef struct packed / typedef union packed are supported (OK to use)IMPORTANT — IEEE 1800 §12.5 requires identifiers to be declared before first use. Xcelium (xmvlog) strictly enforces sequential declaration visibility. Reordering concurrent statements (assign, always_ff, always_comb) has zero synthesis/simulation impact, but declarations MUST always precede their first reference.
module_name_pkg.sv <- Shared type/constant definitions
module_name.sv <- Module implementation (order is MANDATORY):
1. parameter declarations
2. port declarations (ANSI style)
3. import statements
4. typedef / localparam / enum <- all types and constants
5. internal signal declarations <- all signals before any logic
-- declaration boundary ------------ (no declarations below this line)
6. assign statements <- continuous assignments
7. submodule instances (u_ prefix)
8. always_comb blocks <- combinational logic
9. always_ff blocks <- sequential logic
10. assertions (SVA)
See templates/module-template.sv for complete scaffold.
Prefer driving a module's outputs directly from a flip-flop. When a register/pipeline stage is needed, place the register at the output (compute → register → output port), not at the input (register input → combinational logic → output port).
// PREFERRED — registered output: the consumer gets a full clock period; the critical path stays
// inside this module; hierarchical STA and reuse are clean.
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) o_data <= '0;
else o_data <= func(a_q, b_q); // combinational result captured into the output register
end
// DISCOURAGED — input registered, output driven through combinational logic: o_data is
// combinational AT THE PORT, so the consumer loses part of its cycle and STA must trace through
// this module's logic across the boundary.
always_ff @(posedge clk or negedge rst_n) if (!rst_n) a_q <= '0; else a_q <= i_a;
assign o_data = func(a_q);
Combinational (unregistered) outputs are acceptable for thin glue/passthrough logic, but should be a deliberate choice — not the default place to put a needed flop.
function/task should operate only on their arguments (pure). Avoid reading module-level
signals/variables not passed in — hidden inputs cause simulation-sensitivity surprises, obscure
synthesis intent, and hurt reuse/readability. Prefer passing the needed signals as arguments.
If an external (non-argument) dependency is genuinely unavoidable, document it in a header comment at the top of the function/task:
// External deps (read): cfg_mode, base_addr <- MANDATORY when not argument-only
function automatic logic [W-1:0] map_addr(input logic [W-1:0] offset);
map_addr = base_addr + (cfg_mode ? (offset << 1) : offset); // reads module signals — discouraged
endfunction
The items below apply only when specific optimizations are needed. See the
<Advanced>section for detailed patterns and code examples.
<Advanced> section A.1<Advanced> section A.2<Advanced> section A.3<Advanced> section A.4// Clock gating pattern
logic clk_enable;
logic gated_clk;
// Use dedicated ICG cell (synthesis tool maps to library cell)
assign gated_clk = sys_clk & clk_enable; // For simulation only
// Synthesis: replace with ICG instantiation or let tool infer
// Operand isolation — prevent unnecessary switching in multiplier
logic [15:0] mul_a_gated, mul_b_gated;
assign mul_a_gated = i_mul_valid ? i_mul_a : '0;
assign mul_b_gated = i_mul_valid ? i_mul_b : '0;
assign mul_result = mul_a_gated * mul_b_gated;
| Total Bits | Access Pattern | Implementation | Rationale |
|---|---|---|---|
| ≤256 | any | Flip-flop array (logic [W-1:0] name [0:D-1]) | SRAM overhead exceeds benefit |
| 257–4096 | 1 R/W | sram_sp wrapper | Area-efficient; register acceptable with rationale |
| 257–4096 | R+W simultaneous | sram_tp (single-clock) or sram_dp (dual-clock) | Separate read/write ports |
| >4096 | any | SRAM wrapper (mandatory) | Register file wastes area and power |
| any | >2 ports | Flip-flop array (register file) | Multi-port SRAM macros are rare in modern processes |
Place in rtl/common/sram_sp.sv. One R/W port, single clock. Instance with u_mem_ prefix.
module sram_sp #(
parameter int DEPTH = 256,
parameter int WIDTH = 32
) (
input logic clk,
input logic i_ce,
input logic i_we,
input logic [$clog2(DEPTH)-1:0] i_addr,
input logic [WIDTH-1:0] i_wdata,
output logic [WIDTH-1:0] o_rdata
);
`ifdef RAT_MEM_TSMC_N22
// ── Compiled SRAM macro (TSMC N22) — replace with the real instance + pin map ──
// TS1N22ULLSBLVTC256X32M4SWBASO u_macro (
// .CLK(clk), .CEB(~i_ce), .WEB(~i_we), .A(i_addr), .D(i_wdata), .Q(o_rdata));
`elsif RAT_MEM_SKY130
// ── Compiled SRAM macro (SkyWater 130) ──
// sky130_sram_1rw1r_... u_macro ( ... );
`else
// ── Behavioral model — SIMULATION ONLY (skipped at synthesis) ──
// synopsys translate_off
logic [WIDTH-1:0] mem [0:DEPTH-1];
always_ff @(posedge clk) begin
if (i_ce) begin
if (i_we) mem[i_addr] <= i_wdata;
o_rdata <= mem[i_addr]; // 1-cycle read latency
end
end
// synopsys translate_on
`endif
endmodule
The behavioral array is wrapped in // synopsys translate_off … translate_on, so DC/Genus
skip it during synthesis (simulators still run it) — the 2-D array is never elaborated into
flip-flops. Selecting a process define (e.g. +define+RAT_MEM_TSMC_N22, passed by
run_syn.sh --mem-process) activates a compiled-macro branch instead.
run_syn.sh handles the rest automatically (DC/Genus). A real macro needs BOTH --mem-process
(to activate the `ifdef branch) AND --mem-lib (to resolve its timing):
set_dont_touch + set_disable_timing on
the instantiated memory cells, gated on get_cells) and the tool prints a WARNING. Fast
synthesis, no flop array, no STA through the memory boundary.--mem-process <NAME> + --mem-lib <macro.db|.lib> → the macro is instantiated and the
library linked for real timing/area.Apply the same `ifdef-macro + translate_off structure to sram_tp and sram_dp.
Detail: plugin_docs/specs/2026-05-26-synth-memory-blackbox-design.md.
Place in rtl/common/sram_tp.sv. Separate read and write ports, single clock.
Use when simultaneous read+write is needed within the same clock domain.
module sram_tp #(
parameter int DEPTH = 256,
parameter int WIDTH = 32
) (
input logic clk,
// Write port
input logic i_wen,
input logic [$clog2(DEPTH)-1:0] i_waddr,
input logic [WIDTH-1:0] i_wdata,
// Read port
input logic i_ren,
input logic [$clog2(DEPTH)-1:0] i_raddr,
output logic [WIDTH-1:0] o_rdata
);
// Behavioral model — SIMULATION ONLY. For synthesis, add the sram_sp `ifdef
// compiled-macro branches above this block; the behavioral body stays translate_off-guarded.
// synopsys translate_off
logic [WIDTH-1:0] mem [0:DEPTH-1];
always_ff @(posedge clk) begin
if (i_wen) begin
mem[i_waddr] <= i_wdata;
end
if (i_ren) begin
o_rdata <= mem[i_raddr]; // 1-cycle read latency
end
end
// synopsys translate_on
endmodule
Place in rtl/common/sram_dp.sv. Separate read and write ports, dual clock (wclk/rclk).
Use at clock domain crossings (e.g., async FIFO memory backend, cross-domain shared buffer).
module sram_dp #(
parameter int DEPTH = 256,
parameter int WIDTH = 32
) (
// Write port (write clock domain)
input logic wclk,
input logic i_wen,
input logic [$clog2(DEPTH)-1:0] i_waddr,
input logic [WIDTH-1:0] i_wdata,
// Read port (read clock domain)
input logic rclk,
input logic i_ren,
input logic [$clog2(DEPTH)-1:0] i_raddr,
output logic [WIDTH-1:0] o_rdata
);
// Behavioral model — SIMULATION ONLY. For synthesis, add the sram_sp `ifdef
// compiled-macro branches above this block; the behavioral body stays translate_off-guarded.
// synopsys translate_off
logic [WIDTH-1:0] mem [0:DEPTH-1];
always_ff @(posedge wclk) begin
if (i_wen) begin
mem[i_waddr] <= i_wdata;
end
end
always_ff @(posedge rclk) begin
if (i_ren) begin
o_rdata <= mem[i_raddr]; // 1-cycle read latency (rclk domain)
end
end
// synopsys translate_on
endmodule
Note: sram_dp has two always_ff blocks writing different signals (mem from wclk, o_rdata from rclk),
which is correct — no multi-driver conflict since each signal has a single driver.
// BAD: >4096-bit register array with combinational MUX
// Real case: 4096×13-bit line buffer → 500K+ gate equivalents,
// 5h+ DC compile, 55% area wasted, 4096:1 MUX per bit on critical path
logic signed [12:0] lb_mem [4096];
always_comb begin
rd_data = lb_mem[rd_addr]; // 4096:1 MUX — area/timing/compile disaster
end
// GOOD: Use SRAM wrapper with synchronous read (1-cycle latency)
sram_tp #(.DEPTH(4096), .WIDTH(13)) u_mem_line_buf (
.clk (clk),
.i_wen (wr_en), .i_waddr(wr_addr), .i_wdata(wr_data),
.i_ren (rd_en), .i_raddr(rd_addr), .o_rdata(rd_data) // 1-cycle latency
);
`ifdef SYNTHESIS guard| Resource | Inference Pattern | Notes |
|---|---|---|
| BRAM | logic [W-1:0] mem [0:D-1] + sync read | Async read infers distributed RAM |
| DSP | a * b + c pattern | Better inference with pipeline registers |
| SRL | shift register (always_ff chain) | Auto-inferred, no explicit control needed |
create_clock, set_input_delay, set_output_delay are identicalset_property IOSTANDARD, set_property LOC(* mark_debug = "true" *) attribute to signals targeted for debugu_ prefix conventionrtl-synth-check skill// Before: long combinational path
assign o_result = func_a(func_b(func_c(i_data)));
// After: 2-stage pipeline
logic [W-1:0] stage1_q;
always_ff @(posedge sys_clk or negedge sys_rst_n) begin
if (!sys_rst_n) stage1_q <= '0;
else stage1_q <= func_c(i_data);
end
assign o_result = func_a(func_b(stage1_q));
compile_ultra -retime, Genus: syn_opt -retiming)dont_touch to registers targeted for retimingo_ready propagates backward from the next stage's i_ready<Tool_Usage> This skill is not executed directly. It is referenced by agents that generate SV code (e.g., rtl-coder, sva-extractor). Agents should follow the conventions defined here. </Tool_Usage>
Follows project rules: ALL_CAPS parameter, L_ prefix localparam, snake_case, no CamelCase. ```systemverilog module cabac_encoder #( parameter int unsigned CTX_ADDR_W = 9 ) ( input logic clk, input logic rst_n, input logic [CTX_ADDR_W-1:0] ctx_addr, output logic bin_valid ); localparam int unsigned L_CTX_DEPTH = 2 ** CTX_ADDR_W;typedef enum logic [1:0] { ST_IDLE, ST_ENCODE, ST_DONE } state_e;
</Good>
<Bad>
CamelCase, suffix, reg/wire, magic numbers:
```systemverilog
module cabac_encoder #(
parameter int CtxAddrWidth = 9 // WRONG: CamelCase
) (
input wire clk_i, // WRONG: suffix, wire
input reg rst_ni, // WRONG: reg, suffix
input [8:0] ctx_addr_i, // WRONG: suffix, magic width
output bin_valid_o // WRONG: no type, suffix
);
localparam AddrBits = 4; // WRONG: CamelCase, no L_ prefix
typedef enum logic [1:0] {
StIdle, StEncode, StDone // WRONG: CamelCase enum values
} state_e;
<Escalation_And_Stop_Conditions>
<Final_Checklist>
i_/o_/io_ prefix mandatoryclk (single) or {domain}_clk (multi), Reset: rst_n or {domain}_rst_nALL_CAPS, localparam L_ prefix, enum values ALL_CAPSlogic only (no reg/wire)always_ff (sequential), always_comb (combinational)default present in all case statementsu_ prefix, generate gen_ prefixlogic/typedef/localparam before first assign/always (no forward references)sram_sp/sram_tp/sram_dp wrapper from rtl/common/ (not inline array)u_mem_{purpose}
</Final_Checklist>