From oraclecloud-pack
Sets up OCI monitoring, logging, and alarms with Python SDK. Queries metrics, creates alarm rules, publishes custom metrics, searches logs via Logging service.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin oraclecloud-packThis skill is limited to using the following tools:
Set up programmatic monitoring for OCI infrastructure using the Monitoring, Logging, and Notifications services. The OCI Console buries these features behind nested menus, and the status page has historically failed to acknowledge outages (e.g., London region, January 2026). This skill builds monitoring you control through code — metric queries, alarm rules, custom metric publishing, and log se...
Queries OCI metrics with MQL and creates monitoring alarms using Python SDK. For CPU, memory, network, disk metrics, dashboards, and alerting.
Guides AWS CloudWatch monitoring of logs, metrics, alarms, and dashboards with CLI and boto3 examples for alarms, log insights, metric filters, and troubleshooting.
Provides AWS CloudFormation templates for CloudWatch metrics, alarms, dashboards, log groups, anomaly detection, synthesized canaries, and Application Signals for production infrastructure monitoring.
Share bugs, ideas, or general feedback.
Set up programmatic monitoring for OCI infrastructure using the Monitoring, Logging, and Notifications services. The OCI Console buries these features behind nested menus, and the status page has historically failed to acknowledge outages (e.g., London region, January 2026). This skill builds monitoring you control through code — metric queries, alarm rules, custom metric publishing, and log searches — so you are never surprised by an outage you should have caught.
Purpose: Create a code-driven observability stack that queries metrics, fires alarms, publishes custom metrics, and searches logs without depending on the OCI Console.
~/.oci/configpip install ocimanage alarms and read metrics in the target compartmentOCI publishes built-in metrics for compute, networking, block storage, and more. Query them programmatically:
import oci
from datetime import datetime, timedelta
config = oci.config.from_file("~/.oci/config")
monitoring = oci.monitoring.MonitoringClient(config)
# Query CPU utilization for all instances in a compartment
response = monitoring.summarize_metrics_data(
compartment_id="ocid1.compartment.oc1..example",
summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
namespace="oci_computeagent",
query='CpuUtilization[5m]{availabilityDomain = "Uocm:US-ASHBURN-AD-1"}.mean()',
start_time=(datetime.utcnow() - timedelta(hours=1)).isoformat() + "Z",
end_time=datetime.utcnow().isoformat() + "Z"
)
)
for metric in response.data:
for dp in metric.aggregated_datapoints:
print(f"{dp.timestamp}: {dp.value:.1f}% CPU")
Alarms trigger when a metric crosses a threshold. Create them via SDK so they survive Console UI changes:
monitoring.create_alarm(
oci.monitoring.models.CreateAlarmDetails(
display_name="High CPU Alert",
compartment_id="ocid1.compartment.oc1..example",
metric_compartment_id="ocid1.compartment.oc1..example",
namespace="oci_computeagent",
query='CpuUtilization[5m].mean() > 80',
severity="CRITICAL",
body="CPU utilization exceeded 80% for 5 minutes.",
destinations=["ocid1.onstopic.oc1..example"],
is_enabled=True,
pending_duration="PT5M",
repeat_notification_duration="PT15M"
)
)
print("Alarm created: High CPU Alert")
Push application-level metrics into OCI Monitoring so they can trigger the same alarm system:
from datetime import datetime
monitoring.post_metric_data(
oci.monitoring.models.PostMetricDataDetails(
metric_data=[
oci.monitoring.models.MetricDataDetails(
namespace="custom_app",
compartment_id="ocid1.compartment.oc1..example",
name="RequestLatencyMs",
dimensions={"service": "api-gateway", "endpoint": "/v1/orders"},
datapoints=[
oci.monitoring.models.Datapoint(
timestamp=datetime.utcnow().isoformat() + "Z",
value=142.5
)
]
)
]
)
)
print("Custom metric published: RequestLatencyMs = 142.5ms")
Create a notification topic and email subscription to receive alarm alerts:
notifications = oci.ons.NotificationDataPlaneClient(config)
control_plane = oci.ons.NotificationControlPlaneClient(config)
# Create topic
topic = control_plane.create_topic(
oci.ons.models.CreateTopicDetails(
name="infra-alerts",
compartment_id="ocid1.compartment.oc1..example",
description="Infrastructure alarm notifications"
)
).data
# Subscribe an email endpoint
notifications.create_subscription(
oci.ons.models.CreateSubscriptionDetails(
topic_id=topic.topic_id,
compartment_id="ocid1.compartment.oc1..example",
protocol="EMAIL",
endpoint="oncall@example.com"
)
)
print(f"Topic created: {topic.topic_id}")
Query the OCI Logging service to find specific events across your infrastructure:
logging_search = oci.loggingsearch.LogSearchClient(config)
results = logging_search.search_logs(
oci.loggingsearch.models.SearchLogsDetails(
time_start=(datetime.utcnow() - timedelta(hours=1)).isoformat() + "Z",
time_end=datetime.utcnow().isoformat() + "Z",
search_query=(
'search "ocid1.compartment.oc1..example" '
'| where data.statusCode = 500'
),
is_return_field_info=False
)
)
for log_entry in results.data.results:
print(f"{log_entry.data}")
Monitor endpoint availability with OCI Health Checks:
health = oci.healthchecks.HealthChecksClient(config)
health.create_http_monitor(
oci.healthchecks.models.CreateHttpMonitorDetails(
compartment_id="ocid1.compartment.oc1..example",
display_name="API Health Check",
targets=["api.example.com"],
protocol="HTTPS",
port=443,
path="/health",
interval_in_seconds=30,
timeout_in_seconds=10,
is_enabled=True
)
)
print("Health check probe created: api.example.com/health every 30s")
Successful completion produces:
| Error | Code | Cause | Solution |
|---|---|---|---|
| NotAuthenticated | 401 | Bad API key or expired config | Verify ~/.oci/config fingerprint matches your API key |
| NotAuthorizedOrNotFound | 404 | Missing IAM policy for monitoring | Add: Allow group X to manage alarms in compartment Y |
| TooManyRequests | 429 | Rate limited on metric queries | Reduce query frequency; cache results for dashboards |
| InternalError | 500 | OCI Monitoring service issue | Check OCI Status and retry |
| InvalidParameter | 400 | Wrong MQL query syntax | Verify namespace and metric name; use list_metrics to discover available metrics |
| ServiceError status -1 | N/A | Request timeout on large queries | Narrow the time window or add dimension filters |
Quick metric check with OCI CLI:
# List available metric namespaces
oci monitoring metric list \
--compartment-id ocid1.compartment.oc1..example \
--namespace oci_computeagent
# List all alarms
oci monitoring alarm list \
--compartment-id ocid1.compartment.oc1..example
List all metrics in a namespace to discover what's available:
import oci
config = oci.config.from_file("~/.oci/config")
monitoring = oci.monitoring.MonitoringClient(config)
metrics = monitoring.list_metrics(
compartment_id="ocid1.compartment.oc1..example",
list_metrics_details=oci.monitoring.models.ListMetricsDetails(
namespace="oci_computeagent"
)
).data
for m in metrics:
print(f"{m.name} — dimensions: {m.dimensions}")
After monitoring is in place, proceed to oraclecloud-performance-tuning to optimize shape and storage performance, or see oraclecloud-cost-tuning to set up budget alerts that use the same notification topics.