From notion-pack
Executes Notion incident runbook: triages API outages with bash status/auth checks, applies mitigations via code snippets, and structures postmortems for integration failures.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin notion-packThis skill is limited to using the following tools:
Rapid incident response procedures for Notion API failures. This runbook covers a structured triage flow (under 5 minutes), automated health checks against both status.notion.so and your own integration, a decision tree for classifying failures (Notion-side vs. integration-side), per-error-type mitigation with real `Client` code, cached fallback patterns, communication templates, and postmortem...
Executes production deployment checklist for Notion API integrations (Node.js/@notionhq/client), verifying auth security, rate limits, pagination, errors, monitoring, and OAuth lifecycle.
Interacts with Notion API via REST: authenticate, CRUD pages/databases/blocks/comments, pagination, error handling using curl/jq or scripts. For Notion tasks.
Controls Notion via Python SDK: create/query pages and databases, append blocks (headings, paragraphs, code, callouts). Triggers on Notion API, page creation, database queries.
Share bugs, ideas, or general feedback.
Rapid incident response procedures for Notion API failures. This runbook covers a structured triage flow (under 5 minutes), automated health checks against both status.notion.so and your own integration, a decision tree for classifying failures (Notion-side vs. integration-side), per-error-type mitigation with real Client code, cached fallback patterns, communication templates, and postmortem structure.
NOTION_TOKEN environment variable set for diagnostic API callscurl and jq installed for quick CLI triagenotion-client (pip install notion-client)Run this diagnostic script to determine if the issue is Notion-side or integration-side:
#!/bin/bash
# notion-triage.sh — run at first alert
set -euo pipefail
echo "=== Notion Incident Triage ==="
echo "Time: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
# 1. Check Notion's public status page
echo -e "\n--- Notion Platform Status ---"
STATUS=$(curl -sf https://status.notion.so/api/v2/status.json \
| jq -r '.status.description' 2>/dev/null || echo "UNREACHABLE")
echo "Notion Status: $STATUS"
INCIDENTS=$(curl -sf https://status.notion.so/api/v2/incidents/unresolved.json \
| jq '.incidents | length' 2>/dev/null || echo "UNKNOWN")
echo "Active Incidents: $INCIDENTS"
if [ "$INCIDENTS" != "0" ] && [ "$INCIDENTS" != "UNKNOWN" ]; then
echo "INCIDENT DETAILS:"
curl -sf https://status.notion.so/api/v2/incidents/unresolved.json \
| jq -r '.incidents[] | " - \(.name) (\(.status)): \(.incident_updates[0].body)"'
fi
# 2. Test our integration authentication
echo -e "\n--- Integration Auth Check ---"
AUTH_HTTP=$(curl -sf -o /dev/null -w "%{http_code}" \
https://api.notion.com/v1/users/me \
-H "Authorization: Bearer ${NOTION_TOKEN}" \
-H "Notion-Version: 2022-06-28" 2>/dev/null || echo "000")
echo "Auth HTTP Status: $AUTH_HTTP"
if [ "$AUTH_HTTP" = "200" ]; then
BOT_NAME=$(curl -sf https://api.notion.com/v1/users/me \
-H "Authorization: Bearer ${NOTION_TOKEN}" \
-H "Notion-Version: 2022-06-28" | jq -r '.name')
echo "Bot Name: $BOT_NAME"
fi
# 3. Test database query (if test DB configured)
echo -e "\n--- API Responsiveness ---"
if [ -n "${NOTION_TEST_DATABASE_ID:-}" ]; then
QUERY_RESULT=$(curl -sf -o /dev/null -w "%{http_code} %{time_total}s" \
-X POST "https://api.notion.com/v1/databases/${NOTION_TEST_DATABASE_ID}/query" \
-H "Authorization: Bearer ${NOTION_TOKEN}" \
-H "Notion-Version: 2022-06-28" \
-H "Content-Type: application/json" \
-d '{"page_size": 1}' 2>/dev/null || echo "000 0.000s")
echo "Database Query: $QUERY_RESULT"
else
echo "NOTION_TEST_DATABASE_ID not set — skipping query test"
fi
# 4. Classification
echo -e "\n--- Triage Result ---"
if [ "$STATUS" != "All Systems Operational" ] && [ "$STATUS" != "UNREACHABLE" ]; then
echo "CLASSIFICATION: Notion-side issue. Enable fallback mode."
elif [ "$AUTH_HTTP" = "401" ]; then
echo "CLASSIFICATION: Token expired or revoked. Rotate immediately."
elif [ "$AUTH_HTTP" = "429" ]; then
echo "CLASSIFICATION: Rate limited. Reduce concurrency."
elif [ "$AUTH_HTTP" = "000" ]; then
echo "CLASSIFICATION: Network/DNS issue. Check firewall and DNS."
else
echo "CLASSIFICATION: Integration-side issue. Check application logs."
fi
TypeScript — programmatic triage:
import { Client, isNotionClientError, APIErrorCode } from '@notionhq/client';
async function triageNotionHealth(token: string): Promise<{
classification: string;
notionStatus: string;
authStatus: string;
latencyMs: number;
}> {
// Check Notion status page
let notionStatus = 'unknown';
try {
const res = await fetch('https://status.notion.so/api/v2/status.json');
const data = await res.json();
notionStatus = data.status.description;
} catch { notionStatus = 'unreachable'; }
// Test our authentication
const client = new Client({ auth: token, timeoutMs: 10_000 });
const start = Date.now();
let authStatus = 'unknown';
let classification = 'unknown';
try {
await client.users.me({});
authStatus = 'authenticated';
classification = 'integration-side';
} catch (error) {
if (isNotionClientError(error)) {
authStatus = `${error.code} (HTTP ${error.status})`;
switch (error.code) {
case APIErrorCode.Unauthorized:
classification = 'token-expired';
break;
case APIErrorCode.RateLimited:
classification = 'rate-limited';
break;
case APIErrorCode.ServiceUnavailable:
classification = 'notion-down';
break;
default:
classification = 'api-error';
}
} else {
authStatus = 'network-error';
classification = 'network-issue';
}
}
if (notionStatus !== 'All Systems Operational') {
classification = 'notion-side';
}
return {
classification,
notionStatus,
authStatus,
latencyMs: Date.now() - start,
};
}
Is status.notion.so showing an incident?
|
+-- YES --> Notion-side outage
| +-- Enable cached/fallback mode
| +-- Notify users of degraded service
| +-- Monitor status page for resolution
| +-- DO NOT restart or rotate tokens
|
+-- NO --> Our integration issue
|
+-- Auth returning 401?
| +-- YES --> Token expired or revoked
| | +-- Regenerate at notion.so/my-integrations
| | +-- Update secret manager (see below)
| | +-- Restart application
| +-- NO --> Continue
|
+-- Getting 429 rate limits?
| +-- YES --> Exceeding 3 req/s average
| | +-- Check for runaway loops or webhook storms
| | +-- Reduce concurrency to 1
| | +-- Add exponential backoff
| +-- NO --> Continue
|
+-- Getting 404 on specific resources?
| +-- YES --> Pages unshared or deleted
| | +-- Re-share pages with integration via Connections menu
| | +-- Check if pages were moved to trash
| +-- NO --> Continue
|
+-- Getting 400 validation errors?
| +-- YES --> Database schema changed in Notion UI
| | +-- Re-fetch schema (databases.retrieve)
| | +-- Compare with expected properties
| | +-- Update property mappings in code
| +-- NO --> Investigate application logs
Token rotation:
# AWS Secrets Manager
aws secretsmanager update-secret \
--secret-id notion/production \
--secret-string '{"token":"ntn_NEW_TOKEN_HERE"}'
# GCP Secret Manager
echo -n "ntn_NEW_TOKEN_HERE" | \
gcloud secrets versions add notion-token-prod --data-file=-
# Restart to pick up new token
kubectl rollout restart deployment/my-app # Kubernetes
# or: gcloud run services update my-service --no-traffic # Cloud Run
Cached fallback for Notion outages:
import { Client, isNotionClientError } from '@notionhq/client';
const notion = new Client({ auth: process.env.NOTION_TOKEN! });
const cache = new Map<string, { data: any; timestamp: number }>();
const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
async function queryWithFallback(dbId: string, filter?: any) {
const cacheKey = `query:${dbId}:${JSON.stringify(filter)}`;
try {
const result = await notion.databases.query({
database_id: dbId,
filter,
page_size: 100,
});
// Update cache on success
cache.set(cacheKey, { data: result, timestamp: Date.now() });
return { data: result, source: 'live' as const };
} catch (error) {
// Fall back to cache on any API error
const cached = cache.get(cacheKey);
if (cached && Date.now() - cached.timestamp < CACHE_TTL_MS) {
console.warn(`Notion unavailable, serving cached data (age: ${
Math.round((Date.now() - cached.timestamp) / 1000)
}s)`);
return { data: cached.data, source: 'cache' as const };
}
// No cache available — re-throw
throw error;
}
}
// Schema change detection
async function detectSchemaChanges(dbId: string, expectedProps: string[]) {
const db = await notion.databases.retrieve({ database_id: dbId });
const actualProps = Object.keys(db.properties);
const missing = expectedProps.filter(p => !actualProps.includes(p));
const unexpected = actualProps.filter(p => !expectedProps.includes(p));
if (missing.length > 0 || unexpected.length > 0) {
console.error(JSON.stringify({
event: 'schema_change_detected',
database_id: dbId,
missing_properties: missing,
new_properties: unexpected,
}));
}
return { missing, unexpected, current: actualProps };
}
Internal Slack notification template:
:rotating_light: P[1-4] INCIDENT: Notion Integration
Status: [INVESTIGATING | MITIGATING | RESOLVED]
Impact: [specific user-facing impact]
Root Cause: [Notion outage | Token expired | Rate limited | Schema change]
Action: [current remediation step]
ETA: [estimated resolution or "monitoring"]
Dashboard: [link to monitoring dashboard]
Thread: [link to incident channel thread]
External status page update:
Notion Integration Service Disruption
We are experiencing [brief description of impact]. [Specific feature]
may be unavailable or show stale data.
Workaround: [if available, e.g., "Cached data is being served"]
Next update: [time, e.g., "in 30 minutes or sooner if resolved"]
[ISO 8601 timestamp]
Postmortem template:
## Incident: Notion [Error Type] — [Date]
**Duration:** X hours Y minutes
**Severity:** P[1-4]
**Detection:** [Alert name] / [User report]
### Summary
[1-2 sentence description of what happened and the user impact]
### Timeline (all times UTC)
- HH:MM — First alert fired ([alert name])
- HH:MM — On-call acknowledged, began triage
- HH:MM — Root cause identified: [description]
- HH:MM — Mitigation applied: [action taken]
- HH:MM — Service fully restored
### Root Cause
[Technical explanation — e.g., "Integration token was rotated in Notion
dashboard by a team member without updating the secret manager, causing
all API calls to return 401 Unauthorized."]
### Impact
- Users affected: N
- Duration of degraded service: X minutes
- Data loss: [none | description]
### Action Items
| Priority | Action | Owner | Due |
|----------|--------|-------|-----|
| P1 | [Preventive measure] | @name | YYYY-MM-DD |
| P2 | [Detection improvement] | @name | YYYY-MM-DD |
| P3 | [Process improvement] | @name | YYYY-MM-DD |
| Scenario | Triage Signal | Immediate Action |
|---|---|---|
| Notion platform outage | status.notion.so incident | Enable fallback mode, notify users |
| Token expired/revoked | All requests return 401 | Rotate token in secret manager, restart |
| Rate limited | 429 errors spiking | Reduce concurrency to 1, check for loops |
| Schema changed | 400 on specific operations | Run databases.retrieve, update mappings |
| Network/DNS issue | Timeouts, no HTTP response | Check firewall, DNS resolution, proxy config |
| Pages unshared | 404 on previously working pages | Re-share via Connections menu in Notion |
curl -sf https://api.notion.com/v1/users/me \
-H "Authorization: Bearer ${NOTION_TOKEN}" \
-H "Notion-Version: 2022-06-28" \
| jq '{name: .name, type: .type}' \
|| echo "UNHEALTHY: Notion API unreachable or auth failed"
from notion_client import Client, APIResponseError
import os
def quick_triage():
try:
client = Client(auth=os.environ["NOTION_TOKEN"], timeout_ms=10_000)
me = client.users.me()
print(f"OK: Connected as {me['name']}")
except APIResponseError as e:
print(f"ERROR: {e.code} (HTTP {e.status}): {e.message}")
except Exception as e:
print(f"NETWORK ERROR: {e}")
quick_triage()
For data handling and privacy compliance, see notion-data-handling.