From posthog
Audits all PostHog data warehouse source connections, sync schemas, and webhook channels, producing a prioritized report grouped by severity with recommended next steps.
How this skill is triggered — by the user, by Claude, or both
Slash command
/posthog:auditing-warehouse-source-healthThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill produces a project-wide audit of the **source and sync** side of the data warehouse pipeline — source
This skill produces a project-wide audit of the source and sync side of the data warehouse pipeline — source
connections, sync schemas, and webhook push channels. Use it when the user wants a summary of what's broken with
their imports, not a deep-dive on one sync. The deep-dive on individual failures is
diagnosing-failed-warehouse-syncs; this skill is the scan that tells them where to look first.
The same underlying endpoint (data-warehouse-data-health-issues-retrieve) also reports materialized-view,
batch-export-destination, and transformation issues. Materialized views are covered by
auditing-warehouse-view-health. Destinations (batch exports) and transformations are owned by other products — surface
them if they appear, but route them to the relevant team rather than diagnosing here.
| Tool | Purpose |
|---|---|
data-warehouse-data-health-issues-retrieve | One-shot: all failed/degraded items across the whole pipeline |
external-data-sources-list | All sources with status and latest error |
external-data-schemas-list | All schemas with status, last_synced_at, latest_error |
external-data-sources-webhook-info-retrieve | Check per-source webhook state (not covered by data-health-issues) |
The data-health-issues endpoint aggregates across the whole pipeline — it's the fastest path to a summary. Filter
its results to the source and external_data_sync types for this audit. Use the list endpoints when you need more
context than the summary provides (row counts, non-failing items, schema-level detail).
From the data-health endpoint, this audit cares about two of the five categories:
type | Trigger | Typical urgency |
|---|---|---|
source | ExternalDataSource.status = Error — whole source connection broken | High |
external_data_sync | schema in Failed or BillingLimitReached state (the data-health endpoint returns status: "failed" or status: "billing_limit" respectively) | Medium–High |
Each entry includes id, name, type, status, error, failed_at, url, and source_type.
The other categories the endpoint returns are out of scope for this skill:
materialized_view → auditing-warehouse-view-healthdestination (batch export) → owned by the batch exports / data pipelines producttransformation (HogFunction) → owned by the CDP / ingestion sideNote the data-health endpoint only reports active failures. For source/sync health it doesn't flag:
should_sync = false)Completedsync_type: "webhook" schemas. The bulk-sync safety net can succeed while the webhook
push channel is silently broken (deregistered, disabled on the remote side, failing signature verification).
These don't surface in data-health-issues — check per-source with webhook-info-retrieve.If the user asks about staleness or unused items, reach beyond this endpoint — see Step 4.
Call data-warehouse-data-health-issues-retrieve and keep the source and external_data_sync entries.
If there are no source/sync issues, tell the user their sources are healthy and stop. Don't invent problems.
status: "billing_limit" entries (billing issue, non-technical — flag and route to billing)Failed on heavily-used tables (user asks / check row counts via schemas-list if needed)Failed on less-used tablesRender a prioritized report. Don't dump the raw JSON — human-readable table per category:
## Data warehouse source health — 4 issues
### 🔴 Sources (1)
- Stripe — authentication failed (failed 2h ago). All 8 tables under it are currently dead.
→ `diagnosing-failed-warehouse-syncs` on this source
### 🟠 Sync schemas (3)
- postgres_prod.orders (Failed 6h ago) — column "updated_at" does not exist
- postgres_prod.invoices (Failed 6h ago) — column "updated_at" does not exist
- hubspot.contacts (BillingLimitReached) — team quota exceeded
Recommended order:
1. Stripe auth (everything under it is dead)
2. Schema-drift on postgres_prod.orders / invoices — looks like upstream renamed a column
3. Billing limit on hubspot
The exact format is less important than: prioritized, grouped, actionable, and hinting at the right next skill.
If the user wants more than just "what's on fire" — e.g. "what else should I look at?" — cross-check:
Stale but "Completed" schemas:
Call external-data-schemas-list and look for schemas with old last_synced_at relative to their sync_frequency.
A schema on 1hour frequency that last synced 3 days ago is effectively broken even if status says Completed.
Sources with zero sync activity:
Sources where every schema has should_sync: false or status = Paused. These were set up and then abandoned —
candidates for cleanup via external-data-sources-destroy.
Broken webhooks on webhook-type schemas:
Iterate the sources that have any schema with sync_type: "webhook" (visible via external-data-schemas-list). For
each, call external-data-sources-webhook-info-retrieve({source_id}):
exists: false while a schema is sync_type: "webhook" → webhook was never registered, or was deleted. Push
channel is dead; only the bulk fallback is ingesting.external_status.error present → remote service is reporting a problem (permission revoked, endpoint
deleted on their dashboard).external_status.status not "enabled" → remote has disabled the endpoint (often after repeated delivery
failures).Report these separately from the primary audit — they're a different shape of problem than failed syncs, and the fix
is a different skill (diagnosing-failed-warehouse-syncs scenario I, or setting-up-a-data-warehouse-source step
5.5).
Only run these extra checks if the user explicitly asks for a broader audit — they involve more tool calls and heuristics.
End the audit with a clear hand-off:
diagnosing-failed-warehouse-syncstuning-incremental-sync-configexternal-data-schemas-partial-updateNever start applying fixes autonomously from an audit — the audit's job is to report and recommend, not remediate. Any fix should be confirmed explicitly before executing.
data-health-issues only surfaces active failures. For staleness or abandoned sources you need to cross-check
the list endpoints. Only do this when the user explicitly asks for a deeper audit.webhook-info-retrieve rather than inferring from schema status.npx claudepluginhub anthropics/claude-plugins-official --plugin posthogDiagnose why a data warehouse sync is failing and recommend the right recovery action. Covers source-level vs schema-level failures, stuck states, credential and schema-drift errors, and incremental-field misconfig.
Provides guidance and templates for pushing metadata, lineage, and query logs from any data warehouse to Monte Carlo via its push ingestion API.
Manage data quality in DataHub: create and run assertions, check outcomes, raise/resolve incidents, and diagnose health problems across your data estate.