Correlates traces, logs, and metrics using OTel semantic conventions like traceId/spanId in OpenSearch and Prometheus for end-to-end observability investigations from metric spikes to error logs.
npx claudepluginhub opensearch-project/observability-stack --plugin observabilityThis skill is limited to using the following tools:
This skill teaches how to correlate traces, logs, and metrics across all three telemetry signals using shared OTel semantic convention fields. Correlation enables end-to-end investigations: start from a metric spike, trace it to a specific request, and find the associated logs — or start from an error log and reconstruct the full trace that produced it.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
This skill teaches how to correlate traces, logs, and metrics across all three telemetry signals using shared OTel semantic convention fields. Correlation enables end-to-end investigations: start from a metric spike, trace it to a specific request, and find the associated logs — or start from an error log and reconstruct the full trace that produced it.
All OpenSearch queries use the PPL API at /_plugins/_ppl with HTTPS and basic authentication. Prometheus queries use the HTTP API at localhost:9090. Credentials are read from the .env file (default: admin / My_password_123!@#).
| Variable | Default | Description |
|---|---|---|
OPENSEARCH_ENDPOINT | https://localhost:9200 | OpenSearch base URL |
OPENSEARCH_USER | admin | OpenSearch username |
OPENSEARCH_PASSWORD | My_password_123!@# | OpenSearch password |
Traces and logs share traceId and spanId fields. When an application emits a log within an active span, the OTel SDK automatically injects the current trace context into the log record. This creates a direct link between log entries and the spans that produced them.
| Field | Signal | Type | Description |
|---|---|---|---|
traceId | Traces, Logs | keyword | Hex-encoded 128-bit trace identifier shared between spans and log records |
spanId | Traces, Logs | keyword | Hex-encoded 64-bit span identifier shared between spans and log records |
traceFlags | Logs | integer | W3C trace flags (e.g., 01 = sampled) carried on log records |
otel-v1-apm-span-*): traceId and spanId identify each spanlogs-otel-v1-*): traceId and spanId link the log to the span that was active when the log was emittedPrometheus exemplars attach trace context to individual metric samples. When the OTel SDK records a metric observation inside an active span, it can attach the trace_id and span_id as exemplar labels. This links a specific metric data point back to the trace that produced it.
Exemplar data model:
| Field | Description |
|---|---|
trace_id | Hex-encoded trace identifier from the span active during metric recording |
span_id | Hex-encoded span identifier from the span active during metric recording |
filtered_attributes | Additional key-value pairs attached to the exemplar |
timestamp | Time when the exemplar was recorded |
value | The metric sample value associated with this exemplar |
All three signals (traces, logs, metrics) share resource attributes that identify the originating service. These attributes are set by the OTel SDK and propagated through the pipeline:
| Resource Attribute | Traces Field | Logs Field | Prometheus Label | Description |
|---|---|---|---|---|
service.name | serviceName | resource.attributes.service.name | service_name | Service that produced the telemetry |
service.namespace | resource.service.namespace | resource.attributes.service.namespace | service_namespace | Namespace grouping related services |
service.version | resource.service.version | resource.attributes.service.version | service_version | Service version string |
service.instance.id | resource.service.instance.id | resource.attributes.service.instance.id | service_instance_id | Unique instance identifier |
deployment.environment.name | resource.deployment.environment.name | resource.attributes.deployment.environment.name | deployment_environment_name | Deployment environment (e.g., production, staging) |
The OTel Collector's resourcedetection processor enriches telemetry with environment context, and the Prometheus promote_resource_attributes configuration (in docker-compose/prometheus/prometheus.yml) promotes these resource attributes to metric labels so they are queryable in PromQL.
The following GenAI resource attributes are promoted to Prometheus metric labels via the promote_resource_attributes configuration, enabling metric queries filtered by agent or model:
| Resource Attribute | Prometheus Label | Description |
|---|---|---|
gen_ai.agent.id | gen_ai_agent_id | Agent identifier |
gen_ai.agent.name | gen_ai_agent_name | Human-readable agent name (only available if SDK sets this as a resource attribute; most SDKs set it as a span attribute instead) |
gen_ai.provider.name | gen_ai_provider_name | LLM provider (e.g., bedrock, openai) |
gen_ai.request.model | gen_ai_request_model | Model requested for the operation |
gen_ai.response.model | gen_ai_response_model | Model that actually served the response |
Given a trace ID, find all log entries emitted during that trace:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where traceId = '\''<TRACE_ID>'\'' | fields traceId, spanId, severityText, body, `resource.attributes.service.name`, `@timestamp` | sort `@timestamp`"}'
Given a span ID, find all log entries emitted during that specific span:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where spanId = '\''<SPAN_ID>'\'' | fields traceId, spanId, severityText, body, `resource.attributes.service.name`, `@timestamp` | sort `@timestamp`"}'
Use PPL join to combine trace spans with their correlated logs in a single query:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID>'\'' | join left=s right=l ON s.traceId = l.traceId logs-otel-v1-* | fields s.spanId, s.name, s.serviceName, s.durationInNanos, l.severityText, l.body, l.`@timestamp`"}'
Reconstruct the complete request timeline by interleaving spans and logs sorted by timestamp. Run both queries and merge results by time:
Step 1 — Get all spans for the trace:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID>'\'' | eval signal = '\''span'\'' | fields traceId, spanId, serviceName, name, startTime, endTime, durationInNanos, `status.code`, signal | sort startTime"}'
Step 2 — Get all logs for the trace:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where traceId = '\''<TRACE_ID>'\'' | eval signal = '\''log'\'' | fields traceId, spanId, `resource.attributes.service.name`, severityText, body, `@timestamp`, signal | sort `@timestamp`"}'
Merge both result sets by timestamp to see the full chronological sequence of spans and log entries for the request.
When you find an error log, extract its traceId and query the Trace_Index to reconstruct the full trace:
Step 1 — Find error logs and get their traceId:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where severityText = '\''ERROR'\'' | fields traceId, spanId, severityText, body, `resource.attributes.service.name`, `@timestamp` | sort - `@timestamp` | head 10"}'
Step 2 — Query the Trace_Index with the extracted traceId to get all spans:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID_FROM_LOG>'\'' | fields traceId, spanId, parentSpanId, serviceName, name, startTime, endTime, durationInNanos, `status.code` | sort startTime"}'
When a log entry has a spanId, query the Trace_Index to find the exact span that was active when the log was emitted:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where spanId = '\''<SPAN_ID_FROM_LOG>'\'' | fields traceId, spanId, parentSpanId, serviceName, name, startTime, endTime, durationInNanos, `status.code`, `attributes.gen_ai.operation.name`"}'
Use the Prometheus exemplars API to retrieve trace context attached to metric samples. This links a metric observation back to the specific trace that produced it:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query_exemplars" \
--data-urlencode 'query=http_server_duration_seconds_bucket' \
--data-urlencode 'start=2024-01-01T00:00:00Z' \
--data-urlencode 'end=2024-01-02T00:00:00Z'
The response contains exemplar objects with trace_id and span_id in the labels field:
{
"status": "success",
"data": [
{
"seriesLabels": { "service_name": "my-agent", "__name__": "http_server_duration_seconds_bucket" },
"exemplars": [
{
"labels": { "trace_id": "abc123...", "span_id": "def456..." },
"value": "0.25",
"timestamp": 1704067200.000
}
]
}
]
}
Query exemplars for GenAI operation duration, filtered by agent name:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query_exemplars" \
--data-urlencode 'query=gen_ai_client_operation_duration_seconds_bucket{gen_ai_operation_name="invoke_agent"}' \
--data-urlencode 'start=2024-01-01T00:00:00Z' \
--data-urlencode 'end=2024-01-02T00:00:00Z'
After extracting a trace_id from an exemplar response, query the Trace_Index for the full trace:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID_FROM_EXEMPLAR>'\'' | fields traceId, spanId, parentSpanId, serviceName, name, startTime, endTime, durationInNanos, `status.code` | sort startTime"}'
Filter metrics by GenAI resource labels before correlating to traces via exemplars:
By agent name:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=rate(gen_ai_client_operation_duration_seconds_count{gen_ai_operation_name="invoke_agent"}[5m])'
By model:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=rate(gen_ai_client_token_usage_count[5m])'
Then query exemplars for the filtered metric to get trace IDs for correlation.
The service.name resource attribute is the primary key for correlating telemetry across all three signals at the service level.
Find all traces from a specific service:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where serviceName = '\''my-service'\'' | stats count() as span_count, avg(durationInNanos) as avg_duration by serviceName"}'
Find all logs from the same service:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where `resource.attributes.service.name` = '\''my-service'\'' | stats count() by severityText"}'
Find all metrics from the same service in Prometheus:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=rate(http_server_duration_seconds_count{service_name="my-service"}[5m])'
Query metrics filtered by GenAI resource attributes that are promoted to Prometheus labels:
By agent:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(rate(gen_ai_client_operation_duration_seconds_count{gen_ai_operation_name="invoke_agent"}[5m])) by (gen_ai_operation_name)'
By provider and model:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(rate(gen_ai_client_token_usage_count[5m])) by (gen_ai_request_model)'
service.name, service.version, etc.) on all telemetryresourcedetection processor enriches telemetry with environment context (Docker, system info)promote_resource_attributes configuration (in docker-compose/prometheus/prometheus.yml) promotes resource attributes to metric labels, making them queryable in PromQLThis ensures the same service.name value appears in traces (serviceName field), logs (resource.attributes.service.name field), and metrics (service_name label) — enabling service-level correlation across all backends.
Investigate a metric anomaly by correlating from metrics → traces → logs.
Step 1 — Detect the spike via PromQL:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=rate(http_server_duration_seconds_count[5m])'
Look for services with unusually high request rates or latency.
Step 2 — Query exemplars to get trace IDs from the spike window:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query_exemplars" \
--data-urlencode 'query=http_server_duration_seconds_bucket' \
--data-urlencode 'start=<SPIKE_START>' \
--data-urlencode 'end=<SPIKE_END>'
Extract trace_id values from the exemplar response.
Step 3 — Query the Trace_Index for those traces:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID_FROM_EXEMPLAR>'\'' | fields traceId, spanId, parentSpanId, serviceName, name, startTime, endTime, durationInNanos, `status.code` | sort startTime"}'
Step 4 — Query the Log_Index for correlated logs:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where traceId = '\''<TRACE_ID_FROM_EXEMPLAR>'\'' | fields traceId, spanId, severityText, body, `resource.attributes.service.name`, `@timestamp` | sort `@timestamp`"}'
Start from an error log and trace back to the root cause.
Step 1 — Find error logs:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where severityText = '\''ERROR'\'' | fields traceId, spanId, severityText, body, `resource.attributes.service.name`, `@timestamp` | sort - `@timestamp` | head 10"}'
Step 2 — Extract the traceId from the error log and reconstruct the full trace tree:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID_FROM_ERROR_LOG>'\'' | fields traceId, spanId, parentSpanId, serviceName, name, startTime, endTime, durationInNanos, `status.code` | sort startTime"}'
Step 3 — Identify the root cause span (look for error status or exceptions):
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID_FROM_ERROR_LOG>'\'' AND `status.code` = 2 | fields traceId, spanId, serviceName, name, `events.attributes.exception.type`, `events.attributes.exception.message` | sort startTime"}'
Step 4 — Get all logs for the error span to see the full context:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where traceId = '\''<TRACE_ID_FROM_ERROR_LOG>'\'' | fields traceId, spanId, severityText, body, `@timestamp` | sort `@timestamp`"}'
Investigate a slow agent invocation by correlating spans, child operations, logs, and metrics.
Step 1 — Find slow invoke_agent spans:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where `attributes.gen_ai.operation.name` = '\''invoke_agent'\'' AND durationInNanos > 5000000000 | fields traceId, spanId, `attributes.gen_ai.agent.name`, durationInNanos, startTime | sort - durationInNanos | head 10"}'
Step 2 — Get all child spans to identify the bottleneck:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID>'\'' | fields traceId, spanId, parentSpanId, name, `attributes.gen_ai.operation.name`, durationInNanos, startTime | sort startTime"}'
Look for child spans with high durationInNanos — these are the bottleneck operations (e.g., slow tool calls, slow LLM responses).
Step 3 — Check tool calls within the slow trace:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID>'\'' AND `attributes.gen_ai.operation.name` = '\''execute_tool'\'' | fields spanId, `attributes.gen_ai.tool.name`, `attributes.gen_ai.tool.call.arguments`, durationInNanos | sort - durationInNanos"}'
Step 4 — Get correlated logs for the slow spans:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where traceId = '\''<TRACE_ID>'\'' | fields spanId, severityText, body, `@timestamp` | sort `@timestamp`"}'
Step 5 — Check GenAI token usage metrics for the agent via PromQL:
curl -s "$PROMETHEUS_ENDPOINT/api/v1/query" \
--data-urlencode 'query=sum(rate(gen_ai_client_token_usage_count[5m])) by (gen_ai_operation_name, gen_ai_request_model)'
Check if the agent is consuming unusually high token counts, which may explain slow response times.
When correlating across signals, use describe or _mapping to discover available fields dynamically. This is especially useful when index schemas differ from the defaults.
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "describe otel-v1-apm-span-000001"}'
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "describe logs-otel-v1-000001"}'
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "describe otel-v2-apm-service-map-000001"}'
Use the field names from describe output to construct correlation queries when the default field names don't match your index schema.
When you have a set of traceIds from span queries, use IN to fetch all correlated logs in one query:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where traceId IN ('\''<TRACE_ID_1>'\'', '\''<TRACE_ID_2>'\'', '\''<TRACE_ID_3>'\'') | fields traceId, spanId, severityText, body, `resource.attributes.service.name`, `@timestamp` | sort `@timestamp`"}'
This is more efficient than querying logs one traceId at a time.
Some logs may have empty or null traceId. Include those alongside correlated logs:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=logs-otel-v1-* | where `resource.attributes.service.name` = '\''frontend'\'' | where (traceId IN ('\''<TRACE_ID_1>'\'', '\''<TRACE_ID_2>'\'') OR traceId = '\'''\'' OR isnull(traceId)) | sort - `@timestamp` | head 50"}'
Identify which remote services a given service calls using coalesce() across OTel attribute variants. Different instrumentation libraries (Node.js, Go, Python, .NET) use different attributes:
curl -sk -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
-X POST "$OPENSEARCH_ENDPOINT/_plugins/_ppl" \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where serviceName = '\''checkout'\'' | where kind = '\''SPAN_KIND_CLIENT'\'' | eval _remoteService = coalesce(`attributes.net.peer.name`, `attributes.server.address`, `attributes.rpc.service`, `attributes.db.system`, `attributes.gen_ai.system`, '\''unknown'\'') | stats count() as calls, avg(durationInNanos) as avg_latency by _remoteService | sort - calls"}'
When correlating across different service types, these are the key fields by protocol:
| Protocol | Remote Service Field | Operation Field |
|---|---|---|
| gRPC | attributes.net.peer.name or attributes.server.address | attributes.rpc.method |
| HTTP | attributes.http.host or attributes.server.address | attributes.http.route |
| Database | attributes.db.system + attributes.server.address | attributes.db.statement |
| Envoy/Istio | attributes.upstream_cluster | span name |
| LLM/GenAI | attributes.gen_ai.system + attributes.server.address | attributes.gen_ai.request.model |
| Message Queue | attributes.messaging.destination.name | span name |
Replace the local OpenSearch endpoint and authentication with AWS SigV4 for all PPL queries in this skill:
curl -s --aws-sigv4 "aws:amz:REGION:es" \
--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
-X POST https://DOMAIN-ID.REGION.es.amazonaws.com/_plugins/_ppl \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | where traceId = '\''<TRACE_ID>'\'' | fields traceId, spanId, parentSpanId, serviceName, name, startTime, endTime, durationInNanos, `status.code` | sort startTime"}'
https://DOMAIN-ID.REGION.es.amazonaws.com--aws-sigv4 "aws:amz:REGION:es" with --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY"/_plugins/_ppl) and query syntax are identical to the local stack-k flag needed — AWS managed endpoints use valid TLS certificatesReplace the local Prometheus endpoint and authentication with AWS SigV4 for all PromQL and exemplar queries:
Query exemplars:
curl -s --aws-sigv4 "aws:amz:REGION:aps" \
--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
'https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query_exemplars' \
--data-urlencode 'query=http_server_duration_seconds_bucket' \
--data-urlencode 'start=2024-01-01T00:00:00Z' \
--data-urlencode 'end=2024-01-02T00:00:00Z'
Query metrics:
curl -s --aws-sigv4 "aws:amz:REGION:aps" \
--user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY" \
'https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query' \
--data-urlencode 'query=rate(http_server_duration_seconds_count{service_name="my-service"}[5m])'
https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query--aws-sigv4 "aws:amz:REGION:aps" with --user "$AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY"