Skill

loki-config-generator

Generates production-ready Grafana Loki configs via Python script for ingester, querier, compactor, ruler with S3/GCS/Azure/filesystem backends. Supports monolithic, simple-scalable, microservices modes and Kubernetes Helm values.

Python

Kubernetes

Install

npx claudepluginhub akin-ozer/cc-devops-skills --plugin devops-skills

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Generate production-ready Grafana Loki server configurations with best practices. Supports monolithic, simple scalable, and microservices deployment modes with S3, GCS, Azure, or filesystem storage.

Supporting Assets

examples/grafana-alloy-daemonset.yamlexamples/grafana-alloy.alloyexamples/kubernetes-helm-values.yamlexamples/microservices-s3.yamlexamples/monolithic-filesystem.yamlexamples/multi-tenant.yamlexamples/production-tls.yamlexamples/simple-scalable-gcs.yamlexamples/simple-scalable-s3.yamlexamples/thanos-storage-azure.yamlexamples/thanos-storage-gcs.yamlexamples/thanos-storage-s3.yamlexamples/with-ruler.yamlreferences/best_practices.mdreferences/loki_config_reference.mdscripts/generate_config.pyscripts/test_generate_config.py

SKILL.md

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

159.9k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

159.9k

next-compile

1 file

Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.

vercel-next-js-2

139.2k

Stats

Parent Repo Stars139

Parent Repo Forks12

Last CommitMar 5, 2026

Actions

View Source View Plugin View on GitHub View README

Loki Configuration Generator

Overview

Generate production-ready Grafana Loki server configurations with best practices. Supports monolithic, simple scalable, and microservices deployment modes with S3, GCS, Azure, or filesystem storage.

Current Stable: Loki 3.6.2 (November 2025) Important: Promtail deprecated in 3.4 - use Grafana Alloy instead. See examples/grafana-alloy.alloy for the Alloy pipeline and examples/grafana-alloy-daemonset.yaml for the Kubernetes deployment.

When to Use

Invoke when: deploying Loki, creating configs from scratch, migrating to Loki, implementing multi-tenant logging, configuring storage backends, or optimizing existing deployments.

Generation Methods

Method 1: Script Generation (Recommended)

Use scripts/generate_config.py for consistent, validated configurations:

# Simple Scalable with S3 (production)
python3 scripts/generate_config.py \
  --mode simple-scalable \
  --storage s3 \
  --bucket my-loki-bucket \
  --region us-east-1 \
  --retention-days 30 \
  --otlp-enabled \
  --output loki-config.yaml

# Monolithic with filesystem (development)
python3 scripts/generate_config.py \
  --mode monolithic \
  --storage filesystem \
  --no-auth-enabled \
  --output loki-dev.yaml

# Production with Thanos storage (Loki 3.4+)
python3 scripts/generate_config.py \
  --mode simple-scalable \
  --storage s3 \
  --thanos-storage \
  --otlp-enabled \
  --time-sharding \
  --output loki-thanos.yaml

Script Options:

Option	Description
`--mode`	monolithic, simple-scalable, microservices
`--storage`	filesystem, s3, gcs, azure
`--auth-enabled` / `--no-auth-enabled`	Explicitly enable/disable auth
`--otlp-enabled`	Enable OTLP ingestion configuration
`--thanos-storage`	Use Thanos object storage client (3.4+, cloud backends)
`--time-sharding`	Enable out-of-order ingestion (simple-scalable)
`--ruler`	Enable alerting/recording rules (not monolithic)
`--horizontal-compactor`	main/worker mode (simple-scalable, 3.6+)
`--zone-awareness`	Enable multi-AZ placement safeguards
`--limits-dry-run`	Log limit rejections without enforcing

Method 2: Manual Configuration

Follow the staged workflow below when script generation doesn't meet specific requirements or when learning the configuration structure.

Output Formats

For Kubernetes deployments, generate BOTH formats:

Native Loki config (loki-config.yaml) - For ConfigMap or direct use
Helm values (values.yaml) - For Helm chart deployments

See examples/kubernetes-helm-values.yaml for Helm format.

Documentation Lookup

When to Use Context7/Web Search

REQUIRED - Use Context7 MCP for:

Configuring features from Loki 3.4+ (Thanos storage, time sharding)
Configuring features from Loki 3.6+ (horizontal compactor, enforced labels)
Bloom filter configuration (complex, experimental)
Custom OTLP attribute mappings beyond standard patterns
Troubleshooting configuration errors

OPTIONAL - Skip documentation lookup for:

Standard deployment modes (monolithic, simple-scalable)
Basic storage configuration (S3, GCS, Azure, filesystem)
Default limits and component settings
Configurations covered in references/ directory

Context7 MCP (preferred)

resolve-library-id: "grafana loki"
get-library-docs: /websites/grafana_loki, topic: [component]

Example topics: storage_config, limits_config, otlp, compactor, ruler, bloom

Web Search Fallback

Use when Context7 unavailable: "Grafana Loki 3.6 [component] configuration documentation site:grafana.com"

Configuration Workflow

Stage 1: Gather Requirements

Deployment Mode:

Mode	Scale	Use Case
Monolithic	<100GB/day	Testing, development
Simple Scalable	100GB-1TB/day	Production
Microservices	>1TB/day	Large-scale, multi-tenant

Storage Backend: S3, GCS, Azure Blob, Filesystem, MinIO

Key Questions: Expected log volume? Retention period? Multi-tenancy needed? High availability requirements? Kubernetes deployment?

Ask the user directly if required information is missing.

Stage 2: Schema Configuration (CRITICAL)

For all new deployments (Loki 2.9+), use TSDB with v13 schema:

schema_config:
  configs:
    - from: "2025-01-01"  # Use deployment date
      store: tsdb
      object_store: s3     # s3, gcs, azure, filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

Key: Schema cannot change after deployment without migration.

Stage 3: Storage Configuration

S3:

common:
  storage:
    s3:
      s3: s3://us-east-1/loki-bucket
      s3forcepathstyle: false

GCS: gcs: { bucket_name: loki-bucket } Azure: azure: { container_name: loki-container, account_name: ${AZURE_ACCOUNT_NAME} } Filesystem: filesystem: { chunks_directory: /loki/chunks, rules_directory: /loki/rules }

Stage 4: Component Configuration

Ingester:

ingester:
  chunk_encoding: snappy
  chunk_idle_period: 30m
  max_chunk_age: 2h
  chunk_target_size: 1572864  # 1.5MB
  lifecycler:
    ring:
      replication_factor: 3  # 3 for production

Querier:

querier:
  max_concurrent: 4
  query_timeout: 1m

Compactor:

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h

Stage 5: Limits Configuration

limits_config:
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  max_streams_per_user: 10000
  max_entries_limit_per_query: 5000
  max_query_length: 721h
  retention_period: 30d
  allow_structured_metadata: true
  volume_enabled: true

Stage 6: Server & Auth

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info

auth_enabled: true  # false for single-tenant

Stage 7: OTLP Ingestion (Loki 3.0+)

Native OpenTelemetry ingestion - use otlphttp exporter (NOT deprecated lokiexporter):

limits_config:
  allow_structured_metadata: true
  otlp_config:
    resource_attributes:
      attributes_config:
        - action: index_label  # Low-cardinality only!
          attributes: [service.name, service.namespace, deployment.environment]
        - action: structured_metadata  # High-cardinality
          attributes: [k8s.pod.name, service.instance.id]

Actions: index_label (searchable, low-cardinality), structured_metadata (queryable), drop

⚠️ NEVER use k8s.pod.name as index_label - use structured_metadata instead.

OTel Collector:

exporters:
  otlphttp:
    endpoint: http://loki:3100/otlp

Stage 8: Caching

chunk_store_config:
  chunk_cache_config:
    memcached_client:
      host: memcached-chunks
      timeout: 500ms

query_range:
  cache_results: true
  results_cache:
    cache:
      memcached_client:
        host: memcached-results

Stage 9: Advanced Features

Pattern Ingester (3.0+):

pattern_ingester:
  enabled: true

Bloom Filters (Experimental, 3.3+): Only for >75TB/month deployments. Works on structured metadata only. See examples/ for config.

Time Sharding (3.4+): For out-of-order ingestion:

limits_config:
  shard_streams:
    time_sharding_enabled: true

Thanos Storage (3.4+): New storage client, opt-in now, default later:

storage_config:
  use_thanos_objstore: true
  object_store:
    s3:
      bucket_name: my-bucket
      endpoint: s3.us-west-2.amazonaws.com

Stage 10: Ruler (Alerting)

ruler:
  storage:
    type: s3
    s3: { bucket_name: loki-ruler }
  alertmanager_url: http://alertmanager:9093
  enable_api: true
  enable_sharding: true

Stage 11: Loki 3.6 Features

Horizontally Scalable Compactor: horizontal_scaling_mode: main|worker
Policy-Based Enforced Labels: enforced_labels: [service.name]
FluentBit v4: structured_metadata parameter support

Stage 12: Validate Configuration (REQUIRED)

Always validate before deployment:

# Syntax and parameter validation
loki -config.file=loki-config.yaml -verify-config

# Print resolved configuration (shows defaults)
loki -config.file=loki-config.yaml -print-config-stderr 2>&1 | head -100

# Dry-run with Docker (if Loki not installed locally)
docker run --rm -v $(pwd)/loki-config.yaml:/etc/loki/config.yaml \
  grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config

Validation Checklist:

No syntax errors from -verify-config
Schema uses tsdb and v13
replication_factor: 3 for production
auth_enabled: true if multi-tenant
Storage credentials/IAM configured
Retention period matches requirements

Production Checklist

High Availability Requirements

Zone-Aware Replication (CRITICAL for production multi-AZ deployments):

When using replication_factor: 3, ALWAYS enable zone-awareness for multi-AZ deployments:

ingester:
  lifecycler:
    ring:
      replication_factor: 3
      zone_awareness_enabled: true  # CRITICAL for multi-AZ

# Set zone via environment variable or config
# Each pod should set its zone based on node topology
common:
  instance_availability_zone: ${AVAILABILITY_ZONE}

Why: Without zone-awareness, all 3 replicas may land in the same AZ. If that AZ fails, you lose data.

Kubernetes Implementation:

# In Helm values or pod spec
env:
  - name: AVAILABILITY_ZONE
    valueFrom:
      fieldRef:
        fieldPath: metadata.labels['topology.kubernetes.io/zone']

TLS Configuration (Production Required)

Enable TLS for all inter-component and client communication:

server:
  http_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt  # For mTLS
  grpc_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt

See examples/production-tls.yaml for complete TLS configuration.

Production Checklist Summary

Requirement	Setting	Required For
`replication_factor: 3`	common block	All production
`zone_awareness_enabled: true`	ingester.lifecycler.ring	Multi-AZ
`auth_enabled: true`	root level	Multi-tenant
TLS enabled	server block	All production
IAM roles (not keys)	storage config	Cloud storage
Caching enabled	chunk_store_config, query_range	Performance
Pattern ingester	pattern_ingester.enabled	Observability
Retention configured	compactor + limits_config	Cost control

Monitoring Recommendations

Key Metrics to Monitor

Configure Prometheus to scrape Loki metrics and alert on these critical indicators:

# Prometheus scrape config
- job_name: 'loki'
  static_configs:
    - targets: ['loki:3100']

Critical Alerts

groups:
  - name: loki-critical
    rules:
      # Ingestion failures
      - alert: LokiIngestionFailures
        expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Loki ingestion failures detected"

      # High stream cardinality (performance killer)
      - alert: LokiHighStreamCardinality
        expr: loki_ingester_memory_streams > 100000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High stream cardinality - review labels"

      # Compaction not running (retention broken)
      - alert: LokiCompactionStalled
        expr: time() - loki_compactor_last_successful_run_timestamp_seconds > 7200
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Loki compaction stalled - retention not enforced"

      # Query latency
      - alert: LokiSlowQueries
        expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki query P99 latency > 30s"

      # Ingester memory pressure
      - alert: LokiIngesterMemoryHigh
        expr: container_memory_usage_bytes{container="ingester"} / container_spec_memory_limit_bytes{container="ingester"} > 0.8
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki ingester memory usage > 80%"

Key Metrics Reference

Metric	Description	Action Threshold
`loki_ingester_memory_streams`	Active streams in memory	>100k: review cardinality
`loki_distributor_ingester_append_failures_total`	Ingestion failures	>0: investigate immediately
`loki_request_duration_seconds`	Query latency	P99 >30s: add caching/queriers
`loki_ingester_chunks_flushed_total`	Chunk flush rate	Low rate: check ingester health
`loki_compactor_last_successful_run_timestamp_seconds`	Last compaction	>2h ago: compaction broken

Grafana Dashboard

Import official Loki dashboards:

Dashboard ID: 13407 - Loki Logs
Dashboard ID: 14055 - Loki Operational

Log Collection with Grafana Alloy

Promtail is deprecated (support ends Feb 2026). Use Grafana Alloy for new deployments.

Basic Alloy Configuration

See examples/grafana-alloy.alloy for the Alloy pipeline and examples/grafana-alloy-daemonset.yaml for the Kubernetes deployment.

// Kubernetes log discovery
discovery.kubernetes "pods" {
  role = "pod"
}

// Relabeling for Kubernetes metadata
discovery.relabel "pods" {
  targets = discovery.kubernetes.pods.targets

  rule {
    source_labels = ["__meta_kubernetes_namespace"]
    target_label  = "namespace"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label  = "pod"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label  = "container"
  }
}

// Log collection
loki.source.kubernetes "pods" {
  targets    = discovery.relabel.pods.output
  forward_to = [loki.write.default.receiver]
}

// Send to Loki
loki.write "default" {
  endpoint {
    url = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"

    // For multi-tenant
    tenant_id = "default"
  }
}

Migration from Promtail

# Convert Promtail config to Alloy
alloy convert --source-format=promtail --output=alloy-config.alloy promtail.yaml

Complete Examples

See examples/ directory for full configurations:

monolithic-filesystem.yaml - Development/testing
simple-scalable-s3.yaml - Production with S3
microservices-s3.yaml - Large-scale distributed
multi-tenant.yaml - Multi-tenant with per-tenant limits
production-tls.yaml - TLS-enabled production config
grafana-alloy.alloy - Log collection pipeline with Alloy
grafana-alloy-daemonset.yaml - Kubernetes DaemonSet for Alloy
kubernetes-helm-values.yaml - Helm chart values

Minimal Monolithic:

auth_enabled: false
server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2025-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

limits_config:
  retention_period: 30d
  allow_structured_metadata: true

compactor:
  working_directory: /loki/compactor
  retention_enabled: true

Helm Deployment

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki -f values.yaml

Generate both native config and Helm values for Kubernetes deployments.

# values.yaml
deploymentMode: SimpleScalable

loki:
  schemaConfig:
    configs:
      - from: "2025-01-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  limits_config:
    retention_period: 30d
    allow_structured_metadata: true
  # Zone awareness for HA
  ingester:
    lifecycler:
      ring:
        zone_awareness_enabled: true

backend:
  replicas: 3
  # Spread across zones
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
read:
  replicas: 3
write:
  replicas: 3

Best Practices

Performance:

chunk_encoding: snappy, chunk_target_size: 1572864
Enable caching (chunks, results)
parallelise_shardable_queries: true

Security:

auth_enabled: true with reverse proxy auth
IAM roles for cloud storage (never hardcode keys)
TLS for all communications (see Production Checklist)

Reliability:

replication_factor: 3 for production
zone_awareness_enabled: true for multi-AZ (see Production Checklist)
Persistent volumes for ingesters
Monitor ingestion rate and query latency (see Monitoring section)

Limits: Set ingestion_rate_mb, max_streams_per_user to prevent overload

Common Issues

Issue	Solution
High ingester memory	Reduce `max_streams_per_user`, lower `chunk_idle_period`
Slow queries	Increase `max_concurrent`, enable parallelization, add caching
Ingestion failures	Check `ingestion_rate_mb`, verify storage connectivity
Storage growing fast	Enable retention, check compression, review cardinality
Data loss in AZ failure	Enable `zone_awareness_enabled: true`
Config validation fails	Run `loki -verify-config`, check YAML syntax

Deprecated (Migrate Away)

boltdb-shipper → tsdb
lokiexporter → otlphttp
Promtail → Grafana Alloy (support ends Feb 2026)

Resources

scripts/generate_config.py - Generate configs programmatically (RECOMMENDED) examples/ - Complete configuration examples for all modes references/ - Full parameter reference and best practices

Related Skills

logql-generator - LogQL query generation
fluentbit-generator - Log collection to Loki