From devops-skills
Generates production-ready Grafana Loki configs via Python script for ingester, querier, compactor, ruler with S3/GCS/Azure/filesystem backends. Supports monolithic, simple-scalable, microservices modes and Kubernetes Helm values.
npx claudepluginhub akin-ozer/cc-devops-skills --plugin devops-skillsThis skill uses the workspace's default tool permissions.
Generate production-ready Grafana Loki server configurations with best practices. Supports monolithic, simple scalable, and microservices deployment modes with S3, GCS, Azure, or filesystem storage.
examples/grafana-alloy-daemonset.yamlexamples/grafana-alloy.alloyexamples/kubernetes-helm-values.yamlexamples/microservices-s3.yamlexamples/monolithic-filesystem.yamlexamples/multi-tenant.yamlexamples/production-tls.yamlexamples/simple-scalable-gcs.yamlexamples/simple-scalable-s3.yamlexamples/thanos-storage-azure.yamlexamples/thanos-storage-gcs.yamlexamples/thanos-storage-s3.yamlexamples/with-ruler.yamlreferences/best_practices.mdreferences/loki_config_reference.mdscripts/generate_config.pyscripts/test_generate_config.pySearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Generate production-ready Grafana Loki server configurations with best practices. Supports monolithic, simple scalable, and microservices deployment modes with S3, GCS, Azure, or filesystem storage.
Current Stable: Loki 3.6.2 (November 2025) Important: Promtail deprecated in 3.4 - use Grafana Alloy instead. See
examples/grafana-alloy.alloyfor the Alloy pipeline andexamples/grafana-alloy-daemonset.yamlfor the Kubernetes deployment.
Invoke when: deploying Loki, creating configs from scratch, migrating to Loki, implementing multi-tenant logging, configuring storage backends, or optimizing existing deployments.
Use scripts/generate_config.py for consistent, validated configurations:
# Simple Scalable with S3 (production)
python3 scripts/generate_config.py \
--mode simple-scalable \
--storage s3 \
--bucket my-loki-bucket \
--region us-east-1 \
--retention-days 30 \
--otlp-enabled \
--output loki-config.yaml
# Monolithic with filesystem (development)
python3 scripts/generate_config.py \
--mode monolithic \
--storage filesystem \
--no-auth-enabled \
--output loki-dev.yaml
# Production with Thanos storage (Loki 3.4+)
python3 scripts/generate_config.py \
--mode simple-scalable \
--storage s3 \
--thanos-storage \
--otlp-enabled \
--time-sharding \
--output loki-thanos.yaml
Script Options:
| Option | Description |
|---|---|
--mode | monolithic, simple-scalable, microservices |
--storage | filesystem, s3, gcs, azure |
--auth-enabled / --no-auth-enabled | Explicitly enable/disable auth |
--otlp-enabled | Enable OTLP ingestion configuration |
--thanos-storage | Use Thanos object storage client (3.4+, cloud backends) |
--time-sharding | Enable out-of-order ingestion (simple-scalable) |
--ruler | Enable alerting/recording rules (not monolithic) |
--horizontal-compactor | main/worker mode (simple-scalable, 3.6+) |
--zone-awareness | Enable multi-AZ placement safeguards |
--limits-dry-run | Log limit rejections without enforcing |
Follow the staged workflow below when script generation doesn't meet specific requirements or when learning the configuration structure.
For Kubernetes deployments, generate BOTH formats:
loki-config.yaml) - For ConfigMap or direct usevalues.yaml) - For Helm chart deploymentsSee examples/kubernetes-helm-values.yaml for Helm format.
REQUIRED - Use Context7 MCP for:
OPTIONAL - Skip documentation lookup for:
references/ directoryresolve-library-id: "grafana loki"
get-library-docs: /websites/grafana_loki, topic: [component]
Example topics: storage_config, limits_config, otlp, compactor, ruler, bloom
Use when Context7 unavailable: "Grafana Loki 3.6 [component] configuration documentation site:grafana.com"
Deployment Mode:
| Mode | Scale | Use Case |
|---|---|---|
| Monolithic | <100GB/day | Testing, development |
| Simple Scalable | 100GB-1TB/day | Production |
| Microservices | >1TB/day | Large-scale, multi-tenant |
Storage Backend: S3, GCS, Azure Blob, Filesystem, MinIO
Key Questions: Expected log volume? Retention period? Multi-tenancy needed? High availability requirements? Kubernetes deployment?
Ask the user directly if required information is missing.
For all new deployments (Loki 2.9+), use TSDB with v13 schema:
schema_config:
configs:
- from: "2025-01-01" # Use deployment date
store: tsdb
object_store: s3 # s3, gcs, azure, filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
Key: Schema cannot change after deployment without migration.
S3:
common:
storage:
s3:
s3: s3://us-east-1/loki-bucket
s3forcepathstyle: false
GCS: gcs: { bucket_name: loki-bucket }
Azure: azure: { container_name: loki-container, account_name: ${AZURE_ACCOUNT_NAME} }
Filesystem: filesystem: { chunks_directory: /loki/chunks, rules_directory: /loki/rules }
Ingester:
ingester:
chunk_encoding: snappy
chunk_idle_period: 30m
max_chunk_age: 2h
chunk_target_size: 1572864 # 1.5MB
lifecycler:
ring:
replication_factor: 3 # 3 for production
Querier:
querier:
max_concurrent: 4
query_timeout: 1m
Compactor:
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
limits_config:
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_streams_per_user: 10000
max_entries_limit_per_query: 5000
max_query_length: 721h
retention_period: 30d
allow_structured_metadata: true
volume_enabled: true
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
auth_enabled: true # false for single-tenant
Native OpenTelemetry ingestion - use otlphttp exporter (NOT deprecated lokiexporter):
limits_config:
allow_structured_metadata: true
otlp_config:
resource_attributes:
attributes_config:
- action: index_label # Low-cardinality only!
attributes: [service.name, service.namespace, deployment.environment]
- action: structured_metadata # High-cardinality
attributes: [k8s.pod.name, service.instance.id]
Actions: index_label (searchable, low-cardinality), structured_metadata (queryable), drop
⚠️ NEVER use
k8s.pod.nameas index_label - use structured_metadata instead.
OTel Collector:
exporters:
otlphttp:
endpoint: http://loki:3100/otlp
chunk_store_config:
chunk_cache_config:
memcached_client:
host: memcached-chunks
timeout: 500ms
query_range:
cache_results: true
results_cache:
cache:
memcached_client:
host: memcached-results
Pattern Ingester (3.0+):
pattern_ingester:
enabled: true
Bloom Filters (Experimental, 3.3+): Only for >75TB/month deployments. Works on structured metadata only. See examples/ for config.
Time Sharding (3.4+): For out-of-order ingestion:
limits_config:
shard_streams:
time_sharding_enabled: true
Thanos Storage (3.4+): New storage client, opt-in now, default later:
storage_config:
use_thanos_objstore: true
object_store:
s3:
bucket_name: my-bucket
endpoint: s3.us-west-2.amazonaws.com
ruler:
storage:
type: s3
s3: { bucket_name: loki-ruler }
alertmanager_url: http://alertmanager:9093
enable_api: true
enable_sharding: true
horizontal_scaling_mode: main|workerenforced_labels: [service.name]structured_metadata parameter supportAlways validate before deployment:
# Syntax and parameter validation
loki -config.file=loki-config.yaml -verify-config
# Print resolved configuration (shows defaults)
loki -config.file=loki-config.yaml -print-config-stderr 2>&1 | head -100
# Dry-run with Docker (if Loki not installed locally)
docker run --rm -v $(pwd)/loki-config.yaml:/etc/loki/config.yaml \
grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config
Validation Checklist:
-verify-configtsdb and v13replication_factor: 3 for productionauth_enabled: true if multi-tenantZone-Aware Replication (CRITICAL for production multi-AZ deployments):
When using replication_factor: 3, ALWAYS enable zone-awareness for multi-AZ deployments:
ingester:
lifecycler:
ring:
replication_factor: 3
zone_awareness_enabled: true # CRITICAL for multi-AZ
# Set zone via environment variable or config
# Each pod should set its zone based on node topology
common:
instance_availability_zone: ${AVAILABILITY_ZONE}
Why: Without zone-awareness, all 3 replicas may land in the same AZ. If that AZ fails, you lose data.
Kubernetes Implementation:
# In Helm values or pod spec
env:
- name: AVAILABILITY_ZONE
valueFrom:
fieldRef:
fieldPath: metadata.labels['topology.kubernetes.io/zone']
Enable TLS for all inter-component and client communication:
server:
http_tls_config:
cert_file: /etc/loki/tls/tls.crt
key_file: /etc/loki/tls/tls.key
client_ca_file: /etc/loki/tls/ca.crt # For mTLS
grpc_tls_config:
cert_file: /etc/loki/tls/tls.crt
key_file: /etc/loki/tls/tls.key
client_ca_file: /etc/loki/tls/ca.crt
See examples/production-tls.yaml for complete TLS configuration.
| Requirement | Setting | Required For |
|---|---|---|
replication_factor: 3 | common block | All production |
zone_awareness_enabled: true | ingester.lifecycler.ring | Multi-AZ |
auth_enabled: true | root level | Multi-tenant |
| TLS enabled | server block | All production |
| IAM roles (not keys) | storage config | Cloud storage |
| Caching enabled | chunk_store_config, query_range | Performance |
| Pattern ingester | pattern_ingester.enabled | Observability |
| Retention configured | compactor + limits_config | Cost control |
Configure Prometheus to scrape Loki metrics and alert on these critical indicators:
# Prometheus scrape config
- job_name: 'loki'
static_configs:
- targets: ['loki:3100']
groups:
- name: loki-critical
rules:
# Ingestion failures
- alert: LokiIngestionFailures
expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Loki ingestion failures detected"
# High stream cardinality (performance killer)
- alert: LokiHighStreamCardinality
expr: loki_ingester_memory_streams > 100000
for: 10m
labels:
severity: warning
annotations:
summary: "High stream cardinality - review labels"
# Compaction not running (retention broken)
- alert: LokiCompactionStalled
expr: time() - loki_compactor_last_successful_run_timestamp_seconds > 7200
for: 5m
labels:
severity: critical
annotations:
summary: "Loki compaction stalled - retention not enforced"
# Query latency
- alert: LokiSlowQueries
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
for: 10m
labels:
severity: warning
annotations:
summary: "Loki query P99 latency > 30s"
# Ingester memory pressure
- alert: LokiIngesterMemoryHigh
expr: container_memory_usage_bytes{container="ingester"} / container_spec_memory_limit_bytes{container="ingester"} > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "Loki ingester memory usage > 80%"
| Metric | Description | Action Threshold |
|---|---|---|
loki_ingester_memory_streams | Active streams in memory | >100k: review cardinality |
loki_distributor_ingester_append_failures_total | Ingestion failures | >0: investigate immediately |
loki_request_duration_seconds | Query latency | P99 >30s: add caching/queriers |
loki_ingester_chunks_flushed_total | Chunk flush rate | Low rate: check ingester health |
loki_compactor_last_successful_run_timestamp_seconds | Last compaction | >2h ago: compaction broken |
Import official Loki dashboards:
13407 - Loki Logs14055 - Loki OperationalPromtail is deprecated (support ends Feb 2026). Use Grafana Alloy for new deployments.
See examples/grafana-alloy.alloy for the Alloy pipeline and examples/grafana-alloy-daemonset.yaml for the Kubernetes deployment.
// Kubernetes log discovery
discovery.kubernetes "pods" {
role = "pod"
}
// Relabeling for Kubernetes metadata
discovery.relabel "pods" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
}
// Log collection
loki.source.kubernetes "pods" {
targets = discovery.relabel.pods.output
forward_to = [loki.write.default.receiver]
}
// Send to Loki
loki.write "default" {
endpoint {
url = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"
// For multi-tenant
tenant_id = "default"
}
}
# Convert Promtail config to Alloy
alloy convert --source-format=promtail --output=alloy-config.alloy promtail.yaml
See examples/ directory for full configurations:
monolithic-filesystem.yaml - Development/testingsimple-scalable-s3.yaml - Production with S3microservices-s3.yaml - Large-scale distributedmulti-tenant.yaml - Multi-tenant with per-tenant limitsproduction-tls.yaml - TLS-enabled production configgrafana-alloy.alloy - Log collection pipeline with Alloygrafana-alloy-daemonset.yaml - Kubernetes DaemonSet for Alloykubernetes-helm-values.yaml - Helm chart valuesMinimal Monolithic:
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2025-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 30d
allow_structured_metadata: true
compactor:
working_directory: /loki/compactor
retention_enabled: true
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki -f values.yaml
Generate both native config and Helm values for Kubernetes deployments.
# values.yaml
deploymentMode: SimpleScalable
loki:
schemaConfig:
configs:
- from: "2025-01-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 30d
allow_structured_metadata: true
# Zone awareness for HA
ingester:
lifecycler:
ring:
zone_awareness_enabled: true
backend:
replicas: 3
# Spread across zones
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
read:
replicas: 3
write:
replicas: 3
Performance:
chunk_encoding: snappy, chunk_target_size: 1572864parallelise_shardable_queries: trueSecurity:
auth_enabled: true with reverse proxy authReliability:
replication_factor: 3 for productionzone_awareness_enabled: true for multi-AZ (see Production Checklist)Limits: Set ingestion_rate_mb, max_streams_per_user to prevent overload
| Issue | Solution |
|---|---|
| High ingester memory | Reduce max_streams_per_user, lower chunk_idle_period |
| Slow queries | Increase max_concurrent, enable parallelization, add caching |
| Ingestion failures | Check ingestion_rate_mb, verify storage connectivity |
| Storage growing fast | Enable retention, check compression, review cardinality |
| Data loss in AZ failure | Enable zone_awareness_enabled: true |
| Config validation fails | Run loki -verify-config, check YAML syntax |
boltdb-shipper → tsdblokiexporter → otlphttpscripts/generate_config.py - Generate configs programmatically (RECOMMENDED) examples/ - Complete configuration examples for all modes references/ - Full parameter reference and best practices