From majestic-data
Validates records using Pydantic models with field validators, model validators, batch processing, nested models, and constraints like gt/le/pattern. For ETL pipelines.
npx claudepluginhub majesticlabs-dev/majestic-marketplace --plugin majestic-dataThis skill is limited to using the following tools:
**Audience:** Data engineers validating records in ETL pipelines.
Provides Python data validation functions and pipelines for DataFrames using custom checks, Pydantic, Pandera, and Great Expectations. Includes schema evolution and pytest assertions.
Guides Pydantic v2 model definition with BaseModel, ConfigDict, field/model validators, computed fields, serialization, TypeAdapter, env config, and v1-to-v2 migration.
Validates data against JSON schemas, business rules, and quality standards including duplicates, anomalies, formats. Generates reports with errors, stats, scores, and fix suggestions.
Share bugs, ideas, or general feedback.
Audience: Data engineers validating records in ETL pipelines.
Goal: Provide reusable Pydantic patterns for record-level validation.
Execute validation functions from scripts/validators.py:
from scripts.validators import (
UserRecord,
Customer,
Order,
Address,
validate_records,
print_validation_errors,
PositiveInt,
Email
)
from scripts.validators import UserRecord
# Validate single record
user = UserRecord(
id=1,
email="USER@example.com",
status="active",
created_at="2024-01-15",
age=25
)
print(user.email) # user@example.com (lowercased)
from scripts.validators import validate_records, print_validation_errors
raw_data = [
{"id": 1, "email": "a@b.com", "status": "active", "created_at": "2024-01-01", "age": 25},
{"id": -1, "email": "invalid", "status": "bad", "created_at": "2024-01-01", "age": 200},
]
valid, invalid = validate_records(raw_data)
if invalid:
print_validation_errors(invalid)
from scripts.validators import Customer, Address
customer = Customer(
id=1,
name="John Doe",
billing_address=Address(
street="123 Main St",
city="NYC",
postal_code="10001"
)
)
# shipping_address defaults to billing_address
| Constraint | Example | Description |
|---|---|---|
gt, ge | Field(gt=0) | Greater than / greater-equal |
lt, le | Field(le=100) | Less than / less-equal |
pattern | Field(pattern=r'^\d+$') | Regex match |
min_length, max_length | Field(min_length=1) | String length |
# Parse from dict
customer = Customer(**data_dict)
# Parse from JSON
customer = Customer.model_validate_json(json_string)
# Export to dict/JSON
data = customer.model_dump()
json_str = customer.model_dump_json()
| Use Case | Pydantic | Alternative |
|---|---|---|
| API request/response | ✓ | FastAPI integration |
| Record-by-record ETL | ✓ | - |
| Full DataFrame validation | - | pandera |
| Pipeline expectations | - | Great Expectations |
pydantic>=2.0
pydantic-settings # For config validation