From observability-assets
Author and review Grafana dashboards as version-controlled JSON assets with stable uid, deliberate datasource handling, and operator-centric panel layout. Use this skill when creating or reviewing Grafana dashboards, editing dashboard JSON, authoring panel queries and visualization config, using Grafana variables (all 9 types), configuring transformations, field config, thresholds, overrides, and value mappings, stabilizing dashboard uid, applying USE/RED/Golden Signals frameworks, setting up repeat behavior and links, using a Grafana mixin or Jsonnet generation workflow, or needing guidance on Grafana dashboard asset structure and panel JSON schema.
npx claudepluginhub ririnto/sinon --plugin observability-assetsThis skill uses the workspace's default tool permissions.
Author and review Grafana dashboards as version-controlled assets while keeping dashboard identity stable across environments.
Prevents silent decimal mismatch bugs in EVM ERC-20 tokens via runtime decimals lookup, chain-aware caching, bridged-token handling, and normalization. For DeFi bots, dashboards using Python/Web3, TypeScript/ethers, Solidity.
Share bugs, ideas, or general feedback.
Author and review Grafana dashboards as version-controlled assets while keeping dashboard identity stable across environments.
The common case: one dashboard with a stable uid, a deliberate title, explicit datasource handling, a default time range no broader than the last 30 minutes, and a panel layout that answers a real operator question instead of becoming a generic metric scrapbook.
This skill owns the ordinary path for dashboard structure, queries, variables, transformations, field configuration, thresholds, legends, units, layout, repeat behavior, links, annotations, panel types, overrides, value mappings, data links, query options, and the dashboard JSON model boundary. For complete panel-type JSON schemas, see ./references/panel-types.md. For variable types, syntax, and global variables, see ./references/variables.md. For field config, overrides, value mappings, and data links, see ./references/field-config.md. For Grafana mixin or Jsonnet-oriented generation workflows, see ./references/grafana-mixin.md. For export cleanup decisions, normalization targets, and ownership boundaries after UI edits or rendering, see ./references/dashboard-structure.md.
uid and title.Default to classic dashboard JSON (schemaVersion 39) for all examples and templates. When working with resource-style schemas or v2 layout models, document the version boundary explicitly in the dashboard asset or review notes.
Minimal dashboard JSON -- smallest valid shape for syntax testing:
{
"uid": "api-overview",
"title": "API Overview",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"panels": []
}
Use when: you need a minimal valid dashboard shell to validate JSON syntax before adding content.
Start by validating the JSON structure before the file is treated as a Git-owned dashboard asset:
python3 -m json.tool grafana/dashboards/api-overview.json
Use when: the dashboard JSON was edited manually and you need the first fast syntax check. Run this from the repository root shown in the path examples, or replace the path with the dashboard file location in your tree.
Stable uid, clear title, one time-picker baseline, and one working panel:
{
"uid": "api-overview",
"title": "API Overview",
"schemaVersion": 39,
"version": 1,
"refresh": "30s",
"time": { "from": "now-30m", "to": "now" },
"timezone": "browser",
"timepicker": {
"refresh_intervals": ["5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h"],
"time_options": ["5m", "15m", "1h", "6h", "12h", "24h", "2d", "7d", "30d"]
},
"templating": { "list": [] },
"annotations": { "list": [] },
"panels": [
{
"id": 1,
"title": "Request Rate",
"type": "timeseries",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [
{ "expr": "round(sum(rate(http_requests_total{job=\"api\"}[5m])), 0.001)", "legendFormat": "req/s", "refId": "A" }
],
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
}
]
}
Use when: you need a small but production-shaped dashboard asset to expand from Git.
Merge this fragment into the full dashboard shell to drive repeated panels from one bounded query variable:
{
"templating": {
"list": [
{
"name": "instance",
"type": "query",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"query": "label_values(up{job=\"api\"}, instance)",
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"hide": 0
}
]
},
"panels": [
{
"title": "Request Rate - ${instance}",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"repeat": "instance",
"repeatDirection": "h",
"maxPerRow": 3
}
]
}
Use when: one panel shape should repeat across a bounded variable set without copying panel JSON by hand. This is a fragment to merge into the full dashboard shell, not a standalone importable dashboard.
For the full variable type reference (all 9 types, syntax formats, global variables), see ./references/variables.md.
Merge this fragment into the target panel when the query already returns the right signal:
{
"fieldConfig": {
"defaults": {
"unit": "reqps",
"min": 0,
"decimals": 2,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 300 },
{ "color": "red", "value": 500 }
]
}
}
},
"options": {
"legend": {
"displayMode": "list"
}
},
"transformations": [
{ "id": "organize", "options": {} }
]
}
Use when: the query already returns the right signal and the panel only needs clearer units, thresholds, legend behavior, or column organization. This is a fragment to merge into an existing panel object, not a standalone dashboard asset or dashboard-root object.
Merge this fragment into the dashboard shell to attach one drilldown path and one event context source directly to the dashboard:
{
"annotations": {
"list": [
{
"name": "Deployments",
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
}
}
]
},
"links": [
{
"title": "API Runbook",
"url": "https://runbooks.example.com/api"
}
]
}
Use when: operators need one direct drilldown and one timeline context source while reading the dashboard. This is a fragment to merge into the full dashboard shell, not a complete dashboard by itself.
Query options control how much data is fetched and how queries are evaluated:
{
"targets": [
{
"expr": "rate(http_requests_total[5m])",
"refId": "A",
"maxDataPoints": 500,
"interval": "",
"intervalMs": 15000,
"queryType": "range",
"datasource": { "type": "prometheus", "uid": "prometheus" }
}
]
}
Key fields:
| Field | Type | Purpose |
|---|---|---|
maxDataPoints | integer | Maximum number of data points returned (default varies by datasource) |
interval / intervalMs | string / integer | Override $__interval (e.g., 15s, 1m) |
queryType | string | "range" (time series) or "instant" (single value at to time) |
timeFrom | string | Relative offset from dashboard start (e.g., "now-6h") |
timeShift | string | Shift query time window (e.g., "1d" for day-over-day comparison) |
cacheTimeout | string | Cache duration override (e.g., "5m") |
Time shift example -- compare today vs yesterday: add "timeShift": "1d" to a second target with the same expression and a different legendFormat.
Choose the panel type based on what the operator needs to see. The most common types are listed first.
| Operator Question | Panel Type | Key Distinguishing Feature |
|---|---|---|
| How does this metric change over time? | timeseries | Line/area chart with X = time axis |
| What is the single current value? | stat | Large number with optional sparkline |
| What is the value relative to min/max? | gauge | Arc gauge with color zones |
| How do categories compare? | barchart | Vertical/horizontal bars, grouped/stacked |
| What is the distribution of values? | histogram | Bucketed frequency distribution |
| Where is density in two dimensions? | heatmap | Color-coded grid (X and Y axes) |
| How is a whole divided? | piechart | Proportional slices |
| What are the raw rows? | table | Tabular data with sorting/filtering |
| What state was active when? | statetimeline | Colored horizontal bands per state |
| How often did states change? | statushistory | Timeline of discrete status events |
| What log lines match? | logs | Log viewer with highlighting |
| What is the trace detail? | traces | Trace waterfall/duration view |
| What is the flame graph? | flamegraph | Hierarchical call-stack profiling |
The default panel for any time-varying metric. Supports multiple series, legends, annotations, and threshold lines.
{
"id": 1,
"title": "Request Rate",
"type": "timeseries",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [
{ "expr": "sum(rate(http_requests_total[5m])) by (method)", "refId": "A" }
],
"fieldConfig": {
"defaults": {
"unit": "reqps",
"thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] }
}
},
"options": {
"legend": { "displayMode": "table", "placement": "right", "calcs": ["mean", "max"] },
"drawStyle": "line",
"lineWidth": 1,
"fillOpacity": 10,
"stacking": { "mode": "none", "group": "A" }
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
}
Key options.drawStyle: "line", "bars", "points". Key options.stacking.mode: "none", "normal", "percent".
Single large number with optional sparkline, progress bar, and text mode.
{
"id": 2,
"title": "Error Rate",
"type": "stat",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [
{ "expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) * 100", "refId": "A" }
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"decimals": 2,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 1 },
{ "color": "red", "value": 5 }
]
},
"color": { "mode": "thresholds" }
}
},
"options": {
"textMode": "auto",
"colorMode": "value",
"graphMode": "area",
"reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false }
},
"gridPos": { "h": 4, "w": 6, "x": 0, "y": 8 }
}
Key options.textMode: "auto", "name", "none", "value". Key options.graphMode: "none", "area", "linear". Key options.colorMode: "none", "value", "background".
Display raw tabular data with column customization, sorting, and cell coloring.
{
"id": 5,
"title": "Top Endpoints",
"type": "table",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"targets": [
{ "expr": "topk(10, sum by (path) (increase(http_requests_total[1h])))", "refId": "A", "format": "table", "instant": true }
],
"fieldConfig": {
"defaults": { "unit": "short", "custom": { "align": "left", "filterable": true } }
},
"options": { "showHeader": true },
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 24 }
}
For complete JSON schemas covering all remaining panel types (gauge, barchart, heatmap, statetimeline, logs, piechart, histogram, bar gauge, candlestick, trend, XY chart, node graph, traces, flame graph, canvas, geomap, dashboard list, alert list, annotations list, and text/news panels), see ./references/panel-types.md.
The three most common variable types. For all 9 types with complete JSON schemas, global variables, format modifier catalog, cascading patterns, and transformation types, see ./references/variables.md.
Queries a datasource to populate options dynamically.
{
"name": "namespace",
"type": "query",
"label": "Namespace",
"datasource": { "type": "prometheus", "uid": "prometheus" },
"query": "label_values(kube_pod_info, namespace)",
"sort": 1,
"multi": true,
"includeAll": true,
"allValue": ".+",
"current": { "selected": true, "text": ["All"], "value": ["$__all"] },
"hide": 0
}
Key fields: query (datasource-specific), regex (filter results), sort (0-6), multi, includeAll, allValue.
Hardcoded list of options defined inline. Use when the set of values is small, stable, and known at authoring time.
{
"name": "env",
"type": "custom",
"label": "Environment",
"query": "prod,staging,dev",
"current": { "selected": true, "text": "prod", "value": "prod" },
"options": [
{ "selected": true, "text": "Production", "value": "prod" },
{ "selected": false, "text": "Staging", "value": "staging" },
{ "selected": false, "text": "Development", "value": "dev" }
],
"hide": 0
}
Free-text input for ad-hoc values. Always sanitize textbox values in queries to prevent injection.
{
"name": "search",
"type": "textbox",
"label": "Search",
"query": "",
"current": { "selected": false, "text": "", "value": "" },
"hide": 0
}
Transformations reshape query results before rendering. Apply them in order; each transformation receives the output of the previous one. For the full catalog of 20+ transformation types, see ./references/variables.md.
Rename, hide, and reorder columns returned by queries.
{
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": { "__name__": true, "job": true },
"renameByName": { "Value #A": "requests_per_sec", "Time": "timestamp" },
"indexByName": {}
}
}
]
}
Combine results from multiple queries into one table by joining on shared fields.
{
"transformations": [
{ "id": "merge", "options": {} }
]
}
Remove rows that do not match a condition.
{
"transformations": [
{
"id": "filterDataByValues",
"options": {
"filters": [
{ "fieldName": "status", "type": "include", "match": { "value": "200" } }
],
"match": "any"
}
}
]
}
Filter types: "include", "exclude". Match modes: "value", "regex", "is", "isNot".
These systems control how data appears visually after queries return results. For the complete unit catalog (~60 specifiers), all 5 matcher types, all override property IDs, all 4 mapping types (range, regex, special), and full data link variable catalog, see ./references/field-config.md.
Applied via fieldConfig.defaults at the panel level.
{
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 2,
"min": 0,
"max": null,
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 50 },
{ "color": "red", "value": 100 }
]
},
"noValue": "--",
"displayName": "",
"color": { "mode": "palette-classic" },
"mappings": []
}
}
}
Common unit specifiers: "short", "percent", "percentunit", "bytes", "bps", "s", "ms", "reqps", "ops". Custom unit prefix/suffix: "prefix:suffix" (e.g., "$:USD" displays as $123 USD). For the complete unit catalog, see ./references/field-config.md.
Steps trigger at exact numeric values. The first step always has value: null (base color).
{
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 50 },
{ "color": "red", "value": 100 }
]
}
For percentage mode (requires explicit min/max on the field), see ./references/field-config.md.
Overrides let you apply different visual settings to specific fields or series within the same panel. An override consists of matchers (which fields to target) and properties (what to change).
{
"fieldConfig": {
"overrides": [
{
"matcher": { "id": "byName", "options": "errors" },
"properties": [
{ "id": "color", "value": { "mode": "fixed", "fixedColor": "red" } },
{ "id": "custom.lineWidth", "value": 3 }
]
}
]
}
}
For all 5 matcher types (byName, byRegexp, byType, byFrameRefID, byValue) and common override property IDs, see ./references/field-config.md.
Map raw values to displayed text/colors without changing the underlying data.
Value mapping (most common): exact match on specific values.
{
"mappings": [
{
"type": "value",
"options": {
"0": { "text": "OK", "color": "green" },
"1": { "text": "WARNING", "color": "yellow" },
"2": { "text": "CRITICAL", "color": "red" }
}
}
]
}
For range, regex, and special mapping types (null/NaN/boolean handling), see ./references/field-config.md. Multiple mapping types can coexist in the same mappings array; they are evaluated in order and the first match wins.
Attach clickable URLs to data points that open external resources with interpolated context.
{
"fieldConfig": {
"defaults": {
"links": [
{
"title": "View Traces",
"url": "https://jaeger.example.com/search?service=${__data.fields.service}&start=${__url_time_range}",
"targetBlank": true,
"tooltip": "Open in Jaeger"
}
]
}
}
}
For the full data link variable catalog (${__series.name}, ${__field.labels.*}, ${__value.*}, etc.), see ./references/field-config.md.
Structure dashboards around proven observability frameworks so every panel answers a meaningful operational question.
Focus on Utilization, Saturation, and Errors for resource-centric views.
| Dimension | Question | Example Metric |
|---|---|---|
| Utilization | How busy is the resource? | node_cpu_seconds_total (non-idle %) |
| Saturation | How much demand is queued? | node_load_avg, queue depth, conn count |
| Errors | How many operations failed? | Error rate, 5xx count, error ratio |
Dashboard layout following USE:
{
"title": "Node Resource Health - ${instance}",
"panels": [
{ "title": "CPU Utilization", "type": "timeseries" },
{ "title": "CPU Saturation (Load)", "type": "timeseries" },
{ "title": "Memory Utilization", "type": "timeseries" },
{ "title": "Memory Saturation (OOM Kills)", "type": "timeseries" },
{ "title": "Disk I/O Utilization", "type": "timeseries" },
{ "title": "Disk Saturation (I/O Wait)", "type": "timeseries" },
{ "title": "Network Errors", "type": "timeseries" }
]
}
Focus on Rate, Errors, and Duration for request-driven services.
| Dimension | Question | Example Metric |
|---|---|---|
| Rate | How many requests per second? | rate(http_requests_total[5m]) |
| Errors | How many are failing? | rate(http_requests_total{status=~"5.."}[5m]) |
| Duration | How long do requests take? | histogram_quantile(0.99, ...) |
Dashboard layout following RED:
{
"title": "Service RED Metrics - ${service}",
"panels": [
{ "title": "Request Rate", "type": "timeseries" },
{ "title": "Error Rate (%)", "type": "stat" },
{ "title": "Error Count", "type": "timeseries" },
{ "title": "Latency p50", "type": "timeseries" },
{ "title": "Latency p95", "type": "timeseries" },
{ "title": "Latency p99", "type": "timeseries" }
]
}
Google's SRE framework: Latency, Traffic, Errors, and Saturation.
| Signal | Question | Example Metric |
|---|---|---|
| Latency | How long do requests take? | Service response time percentiles |
| Traffic | How much demand is there? | Requests/sec, connections/sec |
| Errors | How many are failing? | Error rate, failure percentage |
| Saturation | How close to capacity? | CPU, memory, disk, connection pool usage |
Dashboard layout following Four Golden Signals:
{
"title": "SRE Golden Signals - ${service}",
"panels": [
{ "title": "Request Latency (p50/p95/p99)", "type": "timeseries" },
{ "title": "Traffic (RPS)", "type": "stat" },
{ "title": "Error Rate", "type": "stat" },
{ "title": "Errors Over Time", "type": "timeseries" },
{ "title": "CPU Saturation", "type": "gauge" },
{ "title": "Memory Saturation", "type": "gauge" },
{ "title": "Connection Pool Usage", "type": "gauge" }
]
}
Progressive levels of dashboard quality. Aim for Level 3 minimum for production dashboards.
| Level | Characteristics |
|---|---|
| 1 | Basic metrics visible, no structure, no thresholds, copied from export |
| 2 | Panels answer questions, stable UID, explicit datasource, basic thresholds |
| 3 | Follows USE/RED/Golden Signals, consistent units, meaningful titles, variables used deliberately |
| 4 | Includes runbook links, alert annotations, drill-down paths, self-documenting layout |
| 5 | Automated testing, versioned alongside code, reviewed on every change, part of on-call rotation |
Control how the dashboard time range behaves.
{
"time": {
"from": "now-30m",
"to": "now"
},
"timezone": "browser",
"graphTooltip": 0,
"timepicker": {
"refresh_intervals": ["5s", "10s", "30s", "1m", "5m", "15m", "30m", "1h"],
"hidden": false,
"collapse": false,
"time_options": ["5m", "15m", "1h", "6h", "12h", "24h", "2d", "7d", "30d"]
}
}
Key fields:
| Field | Values | Purpose |
|---|---|---|
timezone | "browser", "utc", "America/New_York" etc. | Which timezone to use for display |
graphTooltip | 0 (single), 1 (per series), 2 (all series) | Tooltip behavior on hover |
timepicker.hidden | true, false | Hide the time picker UI entirely |
timepicker.collapse | true, false | Collapse time picker by default |
refresh | "5s", "10s", "30s", "1m", "5m" etc. | Auto-refresh interval |
Panel-level time override -- individual panels can shift or constrain their own time range independently of the dashboard:
{
"timeFrom": "now-6h",
"timeShift": "1d",
"hideTimeOverride": false
}
Add these fields directly to a panel object (not inside options). timeFrom sets the earliest data point relative to now. timeShift shifts the entire query window (useful for day-over-day comparisons). hideTimeOverride hides the panel's custom time indicator.
Control how repeated panels lay out across the dashboard.
{
"repeat": "instance",
"repeatDirection": "h",
"maxPerRow": 3,
"repeatOptions": {
"repeatingPattern": {
"variables": ["instance"]
}
}
}
| Field | Values | Effect |
|---|---|---|
repeat | variable name | Which variable drives repetition |
repeatDirection | "h", "v" | Horizontal or vertical layout |
maxPerRow | integer | Max panels per row (horizontal mode only) |
repeatOptions.repeatingPattern.variables | array | For multi-variable repeats (tabs/rows) |
When repeatDirection is "h", panels fill left-to-right, wrapping to the next row after maxPerRow. When "v", panels stack top-to-bottom. A repeating parent panel can also create tabs instead of grids by setting repeatDirection to "h" and using a row-level repeat container.
Keep dashboard authoring and dashboard delivery separated even when both are reviewed together:
reviewed source of truth: grafana/dashboards/api-overview.json
adjacent but separate concern: provisioning file that points at this dashboard
Dashboard authoring belongs here; provisioning configuration and rollout concerns stay in the adjacent delivery domain.
Direct dashboard asset layout -- keep the repository path explicit:
grafana/
dashboards/
api-overview.json
Use when: the team keeps reviewed dashboard JSON directly in the repository rather than generating it from Jsonnet.
For Grafana mixin or Jsonnet-oriented dashboard generation -- including mixin layouts, Jsonnet source patterns, render commands, and source-vs-rendered handoff -- see ./references/grafana-mixin.md.
For export cleanup decisions, normalization targets, and ownership boundaries after UI edits or rendering, see ./references/dashboard-structure.md.
Validate the common case with these checks:
uid and explicit titleReturn:
| If the blocker is... | Read... |
|---|---|
| Complete JSON schema for ALL panel types (bar gauge, candlestick, trend, XY chart, node graph, traces, flame graph, canvas, geomap, dashboard list, alert list, annotations list, text/news) | ./references/panel-types.md |
| Complete variable reference: all 9 types, syntax formats, global vars, format options, advanced patterns | ./references/variables.md |
| Complete field config, all override property IDs, value mappings, data link variables, data link builder patterns | ./references/field-config.md |
| Grafana mixin or Jsonnet-oriented dashboard generation, source-vs-rendered handoff, render commands | ./references/grafana-mixin.md |
| Export cleanup decisions, normalization targets, ownership boundaries after UI edits or rendering | ./references/dashboard-structure.md |
uid.| Anti-pattern | Why it fails | Correct move |
|---|---|---|
| copying a UI export with unstable metadata and no cleanup | reviews become noisy and identity drifts | normalize the JSON and keep a stable uid, title, and panel structure |
| setting the default dashboard range broader than 30 minutes with no operator reason | live queries scan far more data than the common path needs | start with now-30m and widen only when the investigation needs more history |
| treating generated or exported dashboard JSON as authoritative without checking the source workflow behind it | reviewers lose track of where the asset really comes from | keep direct JSON and mixin-generated workflows explicit and review the right source of truth |
| mixing unrelated panels into one dashboard | operators cannot read the story quickly | group panels by one question such as traffic, latency, and errors |
| leaving datasource references implicit or environment-specific without review | dashboards break when moved between environments | make datasource references explicit in the dashboard asset itself |
| using timeseries panels for everything regardless of the question | single values look wrong on line charts, categorical data looks wrong on time axes | pick the panel type that matches the data shape and operator question |
using $var syntax for multi-value variables in PromQL label matchers | unquoted values break queries when values contain special characters | use ${var:csv} for PromQL label matchers with multi-select variables |
| forgetting to handle null/NaN in value mappings | missing data shows as ugly raw values or breaks visualizations | always include a special mapping for null and NaN |
| putting thresholds on a field without setting min/max when using percentage mode | percentage thresholds have no basis and behave unpredictably | either use absolute mode, or set explicit min and max for percentage mode |
| overriding by name on auto-generated field names like "Value #A" | field names change when queries are edited, breaking overrides | use byRegexp or byFrameRefID matchers for query-derived fields |