Help us improve
Share bugs, ideas, or general feedback.
Checks health of observability stack components including OpenSearch, Prometheus, and OTel Collector; verifies trace and log data ingestion; troubleshoots common issues with curl and bash commands.
npx claudepluginhub opensearch-project/observability-stack --plugin observabilityHow this skill is triggered — by the user, by Claude, or both
Slash command
/opensearch@observability:stack-healthThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill provides health check commands, data verification queries, and troubleshooting guidance for the observability stack. Use it to verify that OpenSearch, Prometheus, the OTel Collector, and Data Prepper are running correctly, and to diagnose data flow problems.
Provides PromQL queries for Prometheus and PPL queries for OpenSearch to retrieve RED metrics (Rate, Errors, Duration) for HTTP service health monitoring.
Instrument apps with OpenTelemetry and send telemetry to Grafana Cloud via OTLP. Covers SDK setup, Alloy collector, sampling, and migration from other observability tools.
Interact with Elasticsearch and Kibana via curl REST API for querying (Query DSL), indexing, CRUD, index management, mappings, aggregations, cluster health, ILM, ES|QL, dashboards, OpenTelemetry patterns, and troubleshooting.
Share bugs, ideas, or general feedback.
This skill provides health check commands, data verification queries, and troubleshooting guidance for the observability stack. Use it to verify that OpenSearch, Prometheus, the OTel Collector, and Data Prepper are running correctly, and to diagnose data flow problems.
Credentials are read from the .env file (default: admin / My_password_123!@#). All OpenSearch curl commands use HTTPS with -k to skip TLS certificate verification for local development.
| Variable | Default | Description |
|---|---|---|
OPENSEARCH_ENDPOINT | https://localhost:9200 | OpenSearch base URL |
OPENSEARCH_USER | admin | OpenSearch username |
OPENSEARCH_PASSWORD | My_password_123!@# | OpenSearch password |
Check the overall cluster status (green, yellow, or red):
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cluster/health?pretty"
A healthy cluster returns "status": "green" or "status": "yellow" (yellow is normal for single-node development clusters).
Verify Prometheus is running and healthy:
curl -s "$PROMETHEUS_ENDPOINT/-/healthy"
Returns Prometheus Server is Healthy. when operational.
Check the OpenTelemetry Collector's internal metrics to verify it is receiving and exporting telemetry:
curl -s http://localhost:8888/metrics
Look for otelcol_receiver_accepted_spans_total, otelcol_exporter_sent_spans_total, and otelcol_exporter_send_failed_spans_total in the output to confirm data flow. (OTel Collector metrics use the _total suffix for counters.)
List all indices to verify data ingestion has created the expected trace, log, and service map indices:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cat/indices?v"
You should see indices matching otel-v1-apm-span-*, logs-otel-v1-*, and otel-v2-apm-service-map if data is flowing.
Verify trace data exists by counting documents in the trace index:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | stats count()"}'
Verify log data exists by counting documents in the log index:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | stats count()"}'
A count of 0 in either query indicates no data has been ingested for that signal. See the Troubleshooting section below.
View the status of all stack containers:
docker compose ps
All services should show Up or Up (healthy). If a service is restarting or exited, check its logs.
View logs for a specific service:
docker compose logs <service-name>
Check Data Prepper for pipeline errors or OpenSearch connection issues:
docker compose logs data-prepper
Check the OTel Collector for receiver, processor, or exporter errors:
docker compose logs otel-collector
Symptoms: Connection refused on port 9200, curl commands timeout or fail.
Diagnostic steps:
docker compose ps opensearch
docker compose ps | grep 9200
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cluster/health?pretty"
docker compose logs opensearch
OPENSEARCH_JAVA_OPTS in docker-compose.yml.Symptoms: Index listing shows no otel-v1-apm-* indices, or document counts are 0.
Diagnostic steps:
curl -s http://localhost:8888/metrics | grep otelcol_receiver_accepted_spans_total
docker compose logs data-prepper | grep -i error
localhost:4317localhost:4318curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" "$OPENSEARCH_ENDPOINT/_cat/indices?v"
Symptoms: Data reaches the OTel Collector but does not appear in OpenSearch indices.
Diagnostic steps:
docker compose logs data-prepper
docker compose restart data-prepper
Symptoms: Applications send telemetry but data does not reach Data Prepper or Prometheus.
Diagnostic steps:
curl -s http://localhost:8888/metrics | grep otelcol_exporter_send_failed
docker compose logs otel-collector
data-prepper:21890) and Prometheus (prometheus:9090) on the Docker network.| Component | Port | Protocol |
|---|---|---|
| OpenSearch | 9200 | HTTPS |
| OTel Collector (gRPC) | 4317 | gRPC |
| OTel Collector (HTTP) | 4318 | HTTP |
| Data Prepper | 21890 | HTTP |
| Prometheus | 9090 | HTTP |
| OpenSearch Dashboards | 5601 | HTTP |
Use the PPL describe command to inspect the field mappings and types of an index. This is useful for verifying which fields are available for querying:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "describe otel-v1-apm-span-*"}'
Use the PPL _explain endpoint to debug query execution plans. This shows how OpenSearch will execute a PPL query without actually running it:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl/_explain" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | head 10"}'
This is useful for diagnosing slow queries, understanding how filters are applied, and verifying that field names resolve correctly.
Discover which observability indices exist and their sizes:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
"$OPENSEARCH_ENDPOINT/_cat/indices/otel-*,logs-otel-*?format=json&h=index,health,docs.count,store.size&s=index"
Discover available fields in each index dynamically instead of relying on hardcoded field names:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
"$OPENSEARCH_ENDPOINT/otel-v1-apm-span-*/_mapping?pretty"
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
"$OPENSEARCH_ENDPOINT/logs-otel-v1-*/_mapping?pretty"
Use PPL describe to list all fields and types in an index:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "describe otel-v1-apm-span-000001"}'
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "describe logs-otel-v1-000001"}'
Replace the local endpoint and authentication with AWS SigV4:
curl -s --aws-sigv4 "aws:amz:REGION:es" \
--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
https://DOMAIN-ID.REGION.es.amazonaws.com/_cluster/health?pretty
Index listing on AWS managed OpenSearch:
curl -s --aws-sigv4 "aws:amz:REGION:es" \
--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
https://DOMAIN-ID.REGION.es.amazonaws.com/_cat/indices?v
https://DOMAIN-ID.REGION.es.amazonaws.com--aws-sigv4 "aws:amz:REGION:es" with --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY"-k flag needed — AWS managed endpoints use valid TLS certificatesCheck Prometheus health on Amazon Managed Service for Prometheus (AMP):
curl -s --aws-sigv4 "aws:amz:REGION:aps" \
--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query \
--data-urlencode 'query=up'
https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query--aws-sigv4 "aws:amz:REGION:aps" with --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY"