Skill

postgres-column-design

Use when designing or modifying a PostgreSQL column's shape — choosing JSONB vs relational columns vs child tables vs external object store, planning a new table, or reviewing a `schema.prisma` change that introduces or widens a JSONB column. Do NOT use for query shape, indexes on existing columns, transactions, or migration mechanics (use `prisma-data-access-guard`); cross-package data ownership (use `architecture-guard`). Covers JSONB justification gates (shape, access/writes, cost), staging vs system-of-record, GIN / `jsonb_path_ops` indexing strategy chosen at design time, TOAST/WAL update amplification.

npx claudepluginhub lgerard314/global-marketplace --plugin global-plugin

Tool Access

This skill is limited to using the following tools:

ReadGrepGlobBash

Preview

Decide the shape of a Postgres column **before** the schema is committed: when a payload should be relational columns, a child table, a JSONB column, or an external object store with a pointer. Apply this skill when planning a new table, adding a new column to an existing model, reviewing a `schema.prisma` change that introduces or widens a JSONB field, or evaluating an existing JSONB blob for ...

SKILL.md

Similar Skills

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

mcp-builder

124.2k

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

9 files

anthropics-skills-13

Stats

Parent Repo Stars1

Parent Repo Forks0

Last CommitApr 30, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Postgres column design

Purpose & scope

Decide the shape of a Postgres column before the schema is committed: when a payload should be relational columns, a child table, a JSONB column, or an external object store with a pointer. Apply this skill when planning a new table, adding a new column to an existing model, reviewing a schema.prisma change that introduces or widens a JSONB field, or evaluating an existing JSONB blob for decomposition. The goal is that JSONB earns its place by satisfying explicit gates — shape, access/writes, and operational cost — rather than being chosen by default for "flexibility."

Core rules

JSONB requires Gate 1 (Shape): the schema is externally owned (vendor API, partner contract, user/plugin extensions), structure varies meaningfully across rows, OR the row is a point-in-time opaque snapshot (audit, event envelope, raw webhook). — Why: internal product churn ("we might add fields") is not external instability; nullable columns or versioned rows handle it without giving up integrity, indexing, and contracts.
No FK, CHECK, or invariant the codebase relies on lives on a value INSIDE a JSONB column. If it must be referenced, joined, cascaded, or constrained, it is a column. — Why: Postgres cannot enforce referential integrity or constraints across paths inside JSONB. Integrity smuggled into the document is integrity that doesn't exist; it leaks to the application layer where it will be forgotten on the next code path that bypasses it.
JSONB columns are not updated in-place on hot paths. A row's JSONB is rewritten in full on every UPDATE — TOAST + MVCC produces large WAL volume, table bloat, and index churn proportional to document size × write rate. — Why: a 4 KB JSONB updated 100×/sec generates ~400 KB/s of WAL on a single row plus a TOAST tuple churn the autovacuum will chase forever; the same fields as scalar columns are bytes per write.
Every JSONB query path names the operator (->, ->>, @>, ?, jsonb_path_query, etc.) AND the index that serves it (GIN default ops, GIN(... jsonb_path_ops) for containment-only, expression index on a known path, partial GIN on a stable predicate). — Why: "we might query anything ad-hoc" without an index plan becomes full-heap containment scans at production volume; "we'll add the index later" on a 100M-row table is a coordinated deploy and hours of CREATE INDEX CONCURRENTLY.
Downstream consumers (other services, reports, exports, FK joins, BI, partner APIs) tolerate this row's JSONB as opaque, OR the durable shape lives in relational columns and JSONB is used for landing/staging only. — Why: the moment another team builds on a JSONB key, an undocumented schema contract exists that cannot be migrated without coordinating a release across every consumer.
Key/path cardinality across rows is bounded — JSONB on this column does not accept arbitrary keys from untrusted input. — Why: GIN size and maintenance cost scale with the set of distinct paths; an unbounded key space (user-defined dotted-name attributes from the public internet, vendor payloads with per-tenant fields) makes the index unmaintainable and queries unpredictable.
Storage and operational cost are justified with workload context, not a single byte threshold. The 2 KB rule of thumb is a sanity check; the actual call accounts for row width, update rate, TOAST behavior, WAL volume, and GIN maintenance. — Why: a 1.5 KB JSONB updated 100×/sec is worse than a 5 KB JSONB written once at insert; thresholds without write-pattern context produce both false confidence ("under 2 KB, ship it") and false alarms.
Staging vs system-of-record is explicit in the schema: short-lived or append-only landing tables (event bus persistence, raw webhook ingest, audit trails) may use JSONB freely; the system-of-record for core domain data is normalized, with JSONB allowed only for genuinely document-shaped slots. — Why: "we'll fix it in the app layer later" is how JSONB metastasizes from a landing envelope into the schema-of-record. Normalize at the boundary, not in the application.

Red flags

Thought	Reality
"We might add fields later — JSONB is more flexible."	Internal product churn isn't external instability. Add nullable columns; that's what migrations are for.
"We can index it later if queries get slow."	A GIN added later on a hot 100M-row table is hours of `CREATE INDEX CONCURRENTLY` and a coordinated deploy. The index plan is part of the column-shape decision, not a follow-up.
"It's only 2 KB, we're under the threshold."	Byte thresholds are a sanity check, not a justification. Write rate × document size × TOAST behavior is the cost.
"FK on the document key — we'll enforce it in app code."	App-layer integrity is no integrity. The first concurrent write or skipped code path produces orphans the database will never detect.
"Frontend just renders the blob, no need for columns."	Today. The next consumer (export, BI, partner API, search) needs field-level access and now it requires a backfill migration.
"Webhook payload — save it raw, JSONB is correct."	Correct for the landing table. NOT correct for the durable record other features query. Separate the two.
"Top-level keys are stable, so a single JSONB column is fine."	The fixed-envelope/variable-payload exception is for the values, not a license to put your stable keys in JSONB instead of columns. If the keys are stable, the columns are stable.
"Spirit of JSONB is flexibility."	Flexibility you can't query, can't constrain, can't migrate, and can't index isn't flexibility — it's deferred cost.

Good vs bad

Internal-stable shape: relational vs JSONB-by-default

Bad:

model Order {
  id       String @id
  // status, total, customerId all live inside metadata — no FK, no index, no contract.
  metadata Json
}

Good (relational where shape is internal/stable; JSONB only for the genuinely variable slot):

model Order {
  id          String      @id
  status      OrderStatus
  totalCents  Int
  customerId  String
  customer    Customer    @relation(fields: [customerId], references: [id])
  // Tenant-defined extension attributes; schema varies per tenant.
  // No FK, no app-side join on its values; indexed via partial GIN on a known path.
  attributes  Json?
}

Hot in-place mutation of JSONB vs column for hot field

Bad:

// Rewrites the entire JSONB document on every heartbeat — TOAST + WAL amplification.
await prisma.session.update({
  where: { id },
  data: { state: { ...session.state, lastSeenAt: now } },
});

Good:

// Hot, mutable field is a column. Only the column page is rewritten.
await prisma.session.update({
  where: { id },
  data: { lastSeenAt: now },
});

Ad-hoc JSONB query without index plan vs operator + GIN chosen at design time

Bad:

// No index plan named; runs as a full-heap containment scan at production volume.
const matches = await prisma.$queryRaw`
  SELECT id FROM "Event" WHERE payload @> ${{ type: 'order.created' }}::jsonb
`;

Good (decide the index when deciding the column):

-- Migration: targeted GIN with jsonb_path_ops for containment-only access.
CREATE INDEX CONCURRENTLY event_payload_path_idx
  ON "Event" USING GIN (payload jsonb_path_ops);

-- Or, when only one path is queried, an expression index is cheaper than full GIN:
CREATE INDEX CONCURRENTLY event_payload_type_idx
  ON "Event" ((payload->>'type'))
  WHERE payload ? 'type';

const matches = await prisma.$queryRaw<{ id: string }[]>`
  SELECT id FROM "Event" WHERE payload @> ${{ type: 'order.created' }}::jsonb
`;

Landing JSONB then normalized durable store vs JSONB-as-system-of-record

Bad (the durable record is the raw blob; every consumer reaches into it):

model WebhookEvent {
  id      String @id
  payload Json   // every report, join, and UI reads .payload.<key>
}

Good (raw landing, then normalized durable shape):

// Landing — short-lived, append-only, opaque. Faithful capture of what arrived.
model WebhookIngest {
  id          String   @id
  receivedAt  DateTime @default(now())
  source      String
  payload     Json
  processedAt DateTime?
}

// Durable system-of-record — normalized columns, FKs, indexes, contracts.
model Order {
  id         String @id
  externalId String @unique
  status     OrderStatus
  customerId String
  // ...
}

The three gates (decision framework)

Gate 1 — Shape. JSONB is on the table only when at least one holds:

Schema is externally owned or uncontrolled (vendor API, partner contract, user/plugin extensions you do not type centrally).
Keys or structure vary meaningfully across rows in ways that would force sparse-wide tables or constant migrations if modeled relationally.
Point-in-time snapshot of an opaque payload (audit trail, domain event envelope, raw webhook ingest) where faithful capture matters more than first-class queryability.

Explicit non-qualifier: internal product churn ("we might add fields later") is NOT Gate 1. Use additive nullable columns or versioned rows.

If Gate 1 fails → relational columns or normalized children. Stop.

Gate 2 — Access, writes, and consumers. ALL must hold:

No FK or invariant the codebase relies on depends on values inside the JSONB.
Not updated in-place at high frequency — same row touched often with a growing or wide document.
Query strategy is honest: the operators needed (->, ->>, @>, ?, jsonb_path_query) and the index that serves them are named (default GIN, jsonb_path_ops, expression index on a stable path, partial GIN with a predicate).
Downstream consumers tolerate the opaque blob on this store, or they consume a derived projection maintained elsewhere.

Narrow exception — fixed envelope, variable payload: top-level keys few and stable, values semi-structured (nested objects, variable-shape arrays). JSONB is permissible for those slots only if Gates 2 and 3 still hold. This is not a license for unbounded key sprawl at the root.

If any bullet fails → decompose into columns, child tables, or split staging vs durable. Stop.

Gate 3 — Cost and operational bounds. ALL must hold:

Hot paths stay narrow: JSONB does not dominate heap width, TOAST compression, or update amplification for measured or reasonably expected workloads. The 2 KB threshold is a sanity check — justify with row shape and update rate, not the number alone.
Key/path cardinality across rows is bounded enough that GIN remains maintainable; no indexing of an ever-growing set of distinct paths from untrusted input.

If any fails → dedicated columns, normalized child table(s), external object store + pointer, or split staging vs durable. Stop.

Indexing JSONB honestly

Default GIN on a JSONB column supports @>, ?, ?&, ?|, and path-existence queries. GIN ... jsonb_path_ops is smaller and faster for containment-only (@>) workloads but does not support path-existence queries; pick deliberately. For a small, stable set of paths actually queried, expression indexes — ((payload->>'tenantId')), ((payload->'attrs'->>'sku')) — outperform a full GIN and cost less to maintain. Use partial indexes (WHERE payload ? 'tenantId') when only some rows have the path.

A JSONB column with no named query path and no chosen index is a WHERE payload @> ... time bomb — it works on a fixture and dies under load. The decision to allow JSONB and the decision on which index serves which operator are the same decision, made in the same migration.

Update amplification, TOAST, and WAL

Postgres stores values larger than ~2 KB out-of-line in TOAST; JSONB compresses well but compresses less than narrow scalar fields. Every UPDATE on a row produces a new heap tuple (MVCC), and for a row with a JSONB column it rewrites the entire JSONB value — there is no in-place key patch. For a hot row updated frequently this means: the full JSONB written to WAL on every change, every covering index updated, increased TOAST traffic, and bloat the autovacuum has to chase.

The fix is structural: hot, mutable fields are columns; JSONB holds cold or snapshot data. If the design needs both — a stable JSONB plus a frequently-updated counter — split the table or move the counter to a sibling row. "We'll handle bloat later" is paid in p99 latency under load.

Staging vs system-of-record

JSONB is often correct for landing tables: short-lived, append-only, opaque, where the fidelity of "we received exactly this payload" matters more than queryability. The durable system-of-record is a separate table with normalized columns, FKs, indexes, and a deliberate schema. The transformation from landing to durable is a job (worker, trigger, projection); it is not a SELECT into the same JSONB blob.

Confusing the two is the most common path from "one reasonable JSONB column" to "every query in the codebase reaches into an opaque blob." Name the boundary explicitly in the schema (e.g., *Ingest / *Event for landing; the bare domain model for durable) and enforce that no consumer of the durable model reads from the landing model.

Interactions with other skills

Owns: column-shape decisions (JSONB vs relational vs child table vs external store + pointer), JSONB index strategy chosen at design time, staging-vs-system-of-record boundary.
Hands off to: prisma-data-access-guard for query shape, indexes on existing columns, transactions, and migration mechanics once the column shape is decided; architecture-guard for cross-package data ownership; performance-budget-guard when the column-shape choice changes a hot-path query budget.
Does not duplicate: prisma-data-access-guard's query/migration safety; state-integrity-check's cache invalidation.

Review checklist (invoke when reviewing existing code)

Produce a markdown report with these sections:

Summary — one line: GREEN / YELLOW / RED.
Findings — per issue: File:line, severity (low/med/high), category, what's wrong, recommended fix.
Safer alternative — if a JSONB anti-pattern is widespread, prescribe the column / child-table / staging replacement for the whole codebase.
Checklist coverage — for each rule below, mark: PASS / CONCERN / NOT APPLICABLE.
- Rule 1: Each JSONB column satisfies Gate 1 (external schema, varies-across-rows, OR opaque snapshot) — name the qualifier per column.
- Rule 2: No FK, CHECK, or critical invariant depends on values inside any JSONB column.
- Rule 3: No JSONB column is updated in-place on a hot write path.
- Rule 4: Every JSONB query path has a named operator AND a named index (GIN / jsonb_path_ops / expression / partial).
- Rule 5: Downstream consumers tolerate the blob OR the column is staging-only with a normalized durable counterpart.
- Rule 6: Key/path cardinality across rows is bounded; no unbounded keys from untrusted input.
- Rule 7: Storage/op cost is justified with workload context (write rate, row width, TOAST), not a single byte threshold.
- Rule 8: Staging-vs-system-of-record is explicit in the schema.

Required explicit scans:

List every Json field in schema.prisma, with the Gate 1 qualifier and the index that serves its query path. Mark "no plan" entries as findings.
List every prisma.<model>.update (or raw UPDATE) that writes to a JSONB column on a code path that runs more than once per second per row.
List every JSONB-operator query (->, ->>, @>, ?, jsonb_path_*) and the index that serves it.
List every column that lives in JSONB but is referenced as a foreign key, joined on, or constrained in application code.