From harness-claude
Explains EAV schema pattern for dynamic user-defined attributes stored as rows, with SQL examples, tradeoffs vs JSONB, and use cases like product catalogs or medical records.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> A schema pattern for storing dynamic, user-defined attributes as rows instead of columns -- usually avoided in favor of JSONB or polymorphic alternatives, but occasionally justified for genuinely unbounded attribute sets.
Guides using PostgreSQL JSONB for hybrid relational/document storage: embed vs normalize decisions, indexing strategies (GIN, expression), queries, and e-commerce examples.
Designs PostgreSQL/MySQL schemas with normalization (1NF-3NF), relationships, constraints, audit columns. For new databases, reviews, migrations, or fixing missing PKs/FKs, wrong types, denormalization, EAV.
Designs scalable SQL and NoSQL database schemas from entity descriptions, with normalization, indexing, constraints, migrations, and performance best practices.
Share bugs, ideas, or general feedback.
A schema pattern for storing dynamic, user-defined attributes as rows instead of columns -- usually avoided in favor of JSONB or polymorphic alternatives, but occasionally justified for genuinely unbounded attribute sets.
EAV stores data as (entity_id, attribute_name, attribute_value) triples instead of columns:
CREATE TABLE product_attributes (
product_id int REFERENCES products(id),
attribute_name varchar NOT NULL,
attribute_value text,
PRIMARY KEY (product_id, attribute_name)
);
Each attribute becomes a row instead of a column. An entity with 20 attributes produces 20 rows.
Core tradeoffs:
An e-commerce product catalog. Products have standard columns in a relational table plus dynamic attributes in EAV:
CREATE TABLE products (
id serial PRIMARY KEY,
name varchar NOT NULL,
price numeric(10,2) NOT NULL,
sku varchar UNIQUE NOT NULL
);
-- EAV for varying attributes
INSERT INTO product_attributes VALUES
(1, 'color', 'red'),
(1, 'size', 'large'),
(1, 'material', 'cotton'),
(2, 'color', 'blue'),
(2, 'wattage', '60');
Querying "all products where color = red AND size = large" requires a self-join:
SELECT p.id, p.name
FROM products p
JOIN product_attributes a1
ON p.id = a1.product_id AND a1.attribute_name = 'color' AND a1.attribute_value = 'red'
JOIN product_attributes a2
ON p.id = a2.product_id AND a2.attribute_name = 'size' AND a2.attribute_value = 'large';
The same query with JSONB (preferred alternative):
ALTER TABLE products ADD COLUMN attrs jsonb DEFAULT '{}';
-- Single-row update instead of multiple EAV inserts
UPDATE products SET attrs = '{"color": "red", "size": "large", "material": "cotton"}' WHERE id = 1;
-- Dramatically simpler query
SELECT id, name FROM products
WHERE attrs->>'color' = 'red' AND attrs->>'size' = 'large';
The JSONB approach is simpler to query, supports GIN indexing, and stores all attributes in one row. Prefer JSONB for most EAV use cases.
value_type discriminator column. When EAV is unavoidable, include a type column (text, integer, date) and separate typed value columns.(attribute_name, attribute_value). Without them, every attribute filter is a sequential scan on the entire EAV table.JSONB is the primary alternative to EAV in PostgreSQL. It supports:
CREATE INDEX ON products USING gin (attrs jsonb_path_ops);
-- Uses the GIN index:
SELECT * FROM products WHERE attrs @> '{"color": "red"}';
CREATE INDEX ON products ((attrs->>'color'));
?, ?&, ?| operatorsCrosstab queries with the tablefunc extension pivot EAV rows into columns:
SELECT * FROM crosstab(
'SELECT product_id, attribute_name, attribute_value
FROM product_attributes ORDER BY 1, 2',
'SELECT DISTINCT attribute_name FROM product_attributes ORDER BY 1'
) AS ct(product_id int, color text, material text, size text, wattage text);
Typed EAV uses separate value columns per type:
CREATE TABLE product_attributes (
product_id int REFERENCES products(id),
attribute_name varchar NOT NULL,
value_type varchar NOT NULL CHECK (value_type IN ('text', 'integer', 'date', 'numeric')),
text_value text,
int_value integer,
date_value date,
numeric_value numeric,
PRIMARY KEY (product_id, attribute_name)
);
Performance at scale: EAV tables with 100M+ rows require careful indexing. Partition by entity type or attribute category to keep individual partitions manageable. Consider composite indexes on (attribute_name, attribute_value, product_id) for attribute-first lookups.
MySQL 5.7+ supports JSON columns but with weaker indexing than PostgreSQL JSONB:
@>)ALTER TABLE products
ADD COLUMN color varchar(50) GENERATED ALWAYS AS (JSON_UNQUOTE(attrs->'$.color')) VIRTUAL;
CREATE INDEX idx_color ON products(color);
Healthcare system with 50K distinct observation types. Original EAV schema held 2B rows. Patient timeline queries took 30+ seconds because each observation required a self-join. Migration strategy: kept EAV for rare attributes (the long tail below 100 queries/day), moved the top-100 most-queried attributes (covering 95% of all queries) to dedicated columns on the patient record. Common lookup query time dropped from 30s to 200ms. Rare-attribute queries remained in EAV at acceptable latency for their low frequency.