From harness-claude
Guides using PostgreSQL JSONB for hybrid relational/document storage: embed vs normalize decisions, indexing strategies (GIN, expression), queries, and e-commerce examples.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Using JSONB columns to store semi-structured data alongside relational tables, with indexing strategies and guidelines for when to embed vs normalize.
Designs PostgreSQL tables and schemas using best practices for data types, primary/foreign keys, constraints, indexes, normalization, JSONB, and partitioning.
Designs PostgreSQL table schemas with best practices for normalization, data types, indexing, constraints, performance patterns, and advanced features.
Designs and reviews PostgreSQL table schemas using best practices for data types, indexing, constraints, normalization, performance patterns, and PostgreSQL gotchas.
Share bugs, ideas, or general feedback.
Using JSONB columns to store semi-structured data alongside relational tables, with indexing strategies and guidelines for when to embed vs normalize.
PostgreSQL JSONB stores binary JSON -- it supports indexing, containment checks, and path queries. The key design decision is when to embed in JSONB vs when to normalize:
Embed in JSONB when:
Normalize into relational columns when:
The hybrid model -- relational columns for queryable/constrained fields, JSONB for flexible/varying fields:
CREATE TABLE products (
id serial PRIMARY KEY,
name varchar NOT NULL,
price numeric(10,2) NOT NULL,
category_id int REFERENCES categories(id),
attrs jsonb DEFAULT '{}'
);
Core fields (name, price, category_id) are relational -- type-safe, constrained, indexable. Varying fields (attrs) are JSONB -- flexible, no schema migration needed.
Indexing strategies:
-- GIN index for containment/existence queries (@>, ?, ?&, ?|)
CREATE INDEX idx_products_attrs ON products USING gin (attrs);
-- Expression index for specific frequently-queried paths
CREATE INDEX idx_products_color ON products ((attrs->>'color'));
E-commerce product catalog with type-varying attributes:
-- Products with different attribute sets
INSERT INTO products (name, price, category_id, attrs) VALUES
('Running Shoe', 129.99, 1, '{"color": "red", "size": "10", "material": "mesh"}'),
('LED Bulb', 8.99, 2, '{"wattage": 60, "color_temp": "warm", "lumens": 800}'),
('Cotton T-Shirt', 24.99, 3, '{"color": "blue", "size": "L", "fabric": "100% cotton"}');
Containment query (uses GIN index):
SELECT name, price FROM products
WHERE attrs @> '{"color": "red"}';
-- Returns: Running Shoe
Path query (uses expression index):
SELECT name, price FROM products
WHERE attrs->>'color' = 'blue';
-- Returns: Cotton T-Shirt
Aggregation on JSON field:
SELECT attrs->>'color' AS color, count(*)
FROM products
WHERE attrs ? 'color'
GROUP BY attrs->>'color';
EXPLAIN ANALYZE demonstrating index usage:
EXPLAIN ANALYZE SELECT * FROM products WHERE attrs @> '{"color": "red"}';
-- Bitmap Index Scan on idx_products_attrs
-- Recheck Cond: (attrs @> '{"color": "red"}'::jsonb)
Without the GIN index, this query degrades to a sequential scan on every row.
attrs->'specs'->'dimensions'->'width') and un-indexable at deep paths.json type preserves whitespace and key ordering but cannot be indexed. Always use jsonb unless you need exact text preservation.JSONB operators:
| Operator | Purpose | Example |
| -------- | ------------------- | ---------------------------------------------- | -------- | ------------------------- |
| -> | Get element as JSON | attrs->'color' returns "red" (with quotes) |
| ->> | Get element as text | attrs->>'color' returns red (no quotes) |
| @> | Contains | attrs @> '{"color":"red"}' |
| ? | Key exists | attrs ? 'color' |
| ?& | All keys exist | attrs ?& array['color','size'] |
| ? | | Any key exists | attrs ? | array['color','wattage'] |
GIN index operator classes:
jsonb_ops -- supports all operators, larger index sizejsonb_path_ops -- supports only @>, but 2-3x smaller index-- Use jsonb_path_ops when only containment queries are needed
CREATE INDEX ON products USING gin (attrs jsonb_path_ops);
SQL/JSON path queries (PostgreSQL 12+):
SELECT * FROM products
WHERE jsonb_path_exists(attrs, '$.color ? (@ == "red")');
Generated columns from JSONB for computed relational columns:
ALTER TABLE products
ADD COLUMN color varchar
GENERATED ALWAYS AS (attrs->>'color') STORED;
-- Now indexable and queryable as a regular column
CREATE INDEX idx_products_color_col ON products (color);
Generated columns are automatically updated when attrs changes. Use this to "promote" frequently queried JSONB fields to relational columns.
Partial indexes on JSONB paths:
CREATE INDEX idx_products_wattage ON products ((attrs->>'wattage'))
WHERE attrs ? 'wattage';
Only indexes rows that have the wattage key -- smaller and faster for sparse attributes.
Partial updates with jsonb_set:
-- Update a single key without rewriting the entire JSONB value
UPDATE products
SET attrs = jsonb_set(attrs, '{color}', '"green"')
WHERE id = 1;
This is more efficient than replacing the entire attrs value, especially for large documents.
TOAST storage: JSONB values exceeding ~2KB are compressed and stored out-of-line. Very large JSONB values (100KB+) incur TOAST overhead on every read. Monitor with SELECT pg_column_size(attrs) FROM products ORDER BY 1 DESC LIMIT 10;.
MySQL 5.7+ supports a JSON column type but lacks JSONB:
ALTER TABLE products
ADD COLUMN color varchar(50)
GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(attrs, '$.color'))) VIRTUAL;
CREATE INDEX idx_color ON products(color);
@>)CREATE INDEX idx_tags ON products ((CAST(attrs->'$.tags' AS UNSIGNED ARRAY)));JSON_TABLE (8.0+) flattens JSON to rows for complex queriesJSON_CONTAINS() is the closest equivalent to PostgreSQL's @> but cannot use specialized indexesMulti-tenant SaaS platform with custom fields per tenant. Originally used EAV (Entity-Attribute-Value) with 100M rows across 500 tenants. Cross-field queries ("all contacts where region = East AND tier = Enterprise") took 30+ seconds due to self-joins. Migrated to a JSONB custom_fields column with a GIN index. Same queries now complete in 15ms. Storage reduced 60% (one row per entity vs N rows per entity in EAV). For the 5 most-queried custom fields per tenant, added expression indexes that dropped those specific queries to under 3ms.