From rtl-agent-team
This skill should be used when building C functional reference models (no clock/reset) with external memory access abstraction and bitexact verification. DPI-C integration priority.
npx claudepluginhub babyworm/rtl-agent-team --plugin rtl-agent-teamThis skill uses the workspace's default tool permissions.
<Purpose>
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Outputs: refc/.c, refc/include/.h, conformance_report.json. Must achieve bitexact match against JM (H.264) or HM (H.265) reference software. Runs in parallel with p2-arch-design during Phase 2.
<Use_When>
<Do_Not_Use_When>
<Why_This_Exists>
RTL verification requires a golden reference. Writing the reference model before RTL
forces algorithm understanding and exposes spec ambiguities before silicon commitment.
Bitexact match against JM/HM is the industry standard acceptance criterion.
Use skills/ref-model/templates/ for scaffolding: Makefile, ref_model_main.c, dpi_wrapper.h.
The model also serves as a bandwidth analysis tool: by tracking external memory access counts and patterns through the abstraction layer, architects can estimate required memory bandwidth before committing to hardware.
C is chosen over C++ for DPI-C compatibility — the model can be directly called from SystemVerilog testbenches without wrapper overhead. </Why_This_Exists>
<Execution_Policy>
ext_mem_read()/ext_mem_write() to track bandwidth#define DATA_WIDTH, #define PARALLEL_LANES to explore throughput<Tool_Usage>
Task(subagent_type="rtl-agent-team:vcodec-syntax-entropy-expert",
prompt="Provide algorithm pseudocode and edge case table for CABAC entropy coding per H.264 spec section 9.3.")
Task(subagent_type="rtl-agent-team:ref-model-dev",
prompt="Implement C functional reference model at refc/. Must be bitexact vs JM. "
"No clock/reset — pure functional model. I/O as function arguments. "
"Internal memory as arrays/variables. External memory via ext_mem_read/write functions. "
"Datapath width parameterizable via PARALLEL_LANES define. "
"C11, DPI-C compatible (no C++ features). Follow C coding conventions.")
Task(subagent_type="rtl-agent-team:ref-model-reviewer",
prompt="Review refc/ model quality before oracle signoff. "
"Check algorithm fidelity to requirements/spec, fixed-point/bit-width correctness, "
"build warnings/undefined behavior risks, and I/O format compatibility. "
"Save report to reviews/phase-2-architecture/ref-model-review.md with PASS/FAIL.")
# Build and test via Bash CLI (NOT MCP)
Bash: cd refc && make build # gcc -std=c11 -Wall -Wextra -Werror
Bash: cd refc && make test # bitexact comparison
Bash: cd refc && make bandwidth # external memory access analysis
Bash: cd refc && make sanitize # run with -fsanitize=address,undefined
</Tool_Usage>
ref-model-dev implements CABAC coder as pure C function: ```c void cabac_encode(const cabac_input_t *in, cabac_output_t *out, cabac_ctx_t *ctx) { // Internal context table — local array (models SRAM) uint8_t context_table[460]; // External memory read for bitstream buffer ext_mem_read(ctx->bitstream_addr, ctx->read_buf, 64); // Process with PARALLEL_LANES bins per call for (int i = 0; i < PARALLEL_LANES; i++) { ... } // External memory write for encoded output ext_mem_write(ctx->output_addr, out->encoded, out->num_bytes); } ``` Bitexact test runs 500 vectors against JM 19.0; all pass. Bandwidth report shows 2.3 MB/frame external memory reads, 0.8 MB/frame writes. Implementing ref model with clock/reset and cycle-accurate step function — this is Phase 3 BFM territory, not Phase 2 functional model. The purpose here is algorithm correctness and bandwidth estimation, not timing. Using C++ classes and templates — breaks DPI-C compatibility and adds unnecessary complexity for a functional model.<Escalation_And_Stop_Conditions>
<Final_Checklist>
gcc -std=c11 -Wall -Wextra -Werrorreviews/phase-2-architecture/ref-model-feature-coverage.md savedExternal memory access function signature:
// Track every external memory access for bandwidth analysis
typedef struct {
uint64_t total_reads;
uint64_t total_writes;
uint64_t total_read_bytes;
uint64_t total_write_bytes;
uint64_t estimated_read_cycles; /* total_reads * MEM_LATENCY_EXTERNAL */
uint64_t estimated_write_cycles; /* total_writes * MEM_LATENCY_EXTERNAL */
} ext_mem_stats_t;
void ext_mem_read(uint32_t addr, void *buf, uint32_t size);
void ext_mem_write(uint32_t addr, const void *buf, uint32_t size);
ext_mem_stats_t ext_mem_get_stats(void);
void ext_mem_reset_stats(void);
Datapath width exploration: run the model with PARALLEL_LANES=1,2,4,8 and compare bandwidth_report.json to find the optimal balance between throughput and memory bandwidth.