Help us improve
Share bugs, ideas, or general feedback.
From smg
Maps SMG Rust codebase crates to roles, subsystems, key types, and dependencies. Use to understand structure and ownership before changes.
npx claudepluginhub lightseekorg/smg-dev-guide --plugin smgHow this skill is triggered — by the user, by Claude, or both
Slash command
/smg:mapThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
High-performance Rust gateway for LLM inference backends. Routes requests to workers running vLLM, SGLang, TensorRT-LLM with 8 routing policies, KV cache optimization, K8s service discovery, WASM plugins, MCP tool execution, and mesh HA.
Enforces step-by-step implementation workflow for features, bug fixes, and changes in SMG repository by detecting subsystem (config, routing, gRPC, bindings, K8s, storage) and loading specific recipes with verifications.
Assists with Cargo.toml configuration, crate dependency management, project initialization, builds, tests, benchmarks, docs, troubleshooting, and best practices for Rust projects.
Provides expert guidance on Rust 1.75+ for building services, libraries, systems tooling with async patterns (Tokio/axum), advanced types, ownership, lifetimes, and performance optimization.
Share bugs, ideas, or general feedback.
High-performance Rust gateway for LLM inference backends. Routes requests to workers running vLLM, SGLang, TensorRT-LLM with 8 routing policies, KV cache optimization, K8s service discovery, WASM plugins, MCP tool execution, and mesh HA.
| Crate | Role | Key Types |
|---|---|---|
model_gateway | Main binary. HTTP/gRPC handlers, routing engine, service discovery, observability, CLI | RouterConfig, ServerConfig, CliArgs |
protocols | OpenAI-compatible types shared by ALL consumers (config, bindings, API). Sacred — no impl-specific fields. | WorkerSpec, ModelCard, WorkerModels, ChatCompletionRequest/Response |
kv_index | KV cache-aware routing. Radix trees (String for HTTP, Token for gRPC), positional indexer | StringTree, TokenTree, RadixTree trait, PositionalIndexer |
auth | API key (SHA-256 hashed), JWT/OIDC, role-based access (Admin/User), audit logging | JwtConfig, ApiKeyEntry, Principal, Role |
mesh | HA cluster via SWIM gossip. CRDT KV store, partition detection, consistent hashing | ClusterState, WorkerState, NodeStatus |
wasm | WebAssembly plugin system. WIT interface, middleware hooks (OnRequest/OnResponse), LRU cache | WasmModule, Action (Continue/Reject/Modify) |
mcp | MCP protocol client. Tool discovery, execution, approval workflows, response format translation | McpConfig, McpOrchestrator, ToolAnnotations |
grpc_client | gRPC client for backends. Macros for dedup, streaming, trace injection | SglangGrpcClient, VllmGrpcClient |
data_connector | Pluggable storage: PostgreSQL, Oracle, Redis, in-memory. Hook system for interception | StorageBackend trait, StorageHook |
tool_parser | 13+ tool call parsers (JSON, Mistral, Qwen, DeepSeek, Pythonic, etc.). Streaming with incremental JSON | ToolParser trait, ParserFactory, StreamingParseResult |
reasoning_parser | Reasoning extraction from 10+ model families (DeepSeek-R1, Qwen3, Kimi, Cohere). Streaming | ReasoningParser trait, ParserFactory, ParserResult |
tokenizer | LLM tokenization, chat templates | Tokenizer |
multimodal | Image/audio processing. Per-model vision specs (LLaVA, Qwen-VL, Llama4, Phi3-V), media fetching | ImageFrame, ChatContentPart, MediaConnector |
workflow | Step-based async workflow engine (wfaas) | StepExecutor, WorkflowContext |
bindings/python | PyO3 bindings. Router class with ~80 constructor params, enum mapping | Router, PolicyType |
bindings/golang | Go SDK via FFI (cgo). OpenAI-style API, streaming, tool calling | Client, ChatCompletionRequest |
clients/rust | Rust client library | |
grpc_servicer | Python gRPC servicer wrapping vLLM/SGLang backends |
crates/protocols (shared types — ALL consumers)
↑
model_gateway (implementation — ONE consumer writes each field)
↑
bindings/* (language SDKs — wrap model_gateway + protocols)
Directory layout: Library crates live under crates/ (e.g. crates/mcp/, crates/mesh/). model_gateway/ and bindings/ remain at repo root.
Iron law: If only one crate writes a field, it doesn't belong in crates/protocols/. K8s-specific, runtime-specific, or gateway-specific fields stay in model_gateway.
CLI args (main.rs CliArgs) + YAML file (RouterConfig)
↓ merge (CLI overrides file)
DiscoveryConfig / RouterConfig (config/types.rs) — serde-friendly, user-facing
↓ convert in main.rs (TWO paths: to_router_config + to_server_config)
ServiceDiscoveryConfig / ServerConfig — typed, runtime
Both conversion paths in main.rs must stay in sync. Miss one = CLI flag or config file silently ignored.
Client → HTTP/gRPC handler → Auth middleware → WASM OnRequest
→ Routing policy selects worker → Proxy to backend
→ Stream response → Tool/reasoning parsing → WASM OnResponse → Client
Realtime (WebSocket):
Client → WS upgrade → Realtime session registry → Proxy to backend WS
K8s Pod → PodInfo::from_pod() → handle_pod_event() → Job::AddWorker
Step 1: Detect Runtime (sglang/vllm/trt)
Step 2: Discover Connection Mode (HTTP/gRPC)
Step 3: Discover DP Info (rank/size)
Step 4: Discover Metadata → flattens into labels HashMap
Step 5: Create Worker → merge labels, resolve model_id, build ModelCard
Central integration pattern. All worker metadata flows as key-value labels:
cargo +nightly fmt --all # Format
cargo clippy --all-targets --all-features -- -D warnings # Lint
cargo test # Test
make python-dev # Python bindings
make pre-commit # All checks
smg:implement — detects the subsystem and loads step-by-step recipes with verification.smg:contribute — enforces quality gates before PR.smg:review-pr — systematic checklist mapped to changed subsystems.