Skill

pandera-validation

Validates pandas DataFrames using pandera with schema definitions, column checks, decorators, error collection, and schema inference. Ideal for ETL pipelines and data engineering.

Python

data-engineering

npx claudepluginhub majesticlabs-dev/majestic-marketplace --plugin majestic-data

Tool Access

This skill is limited to using the following tools:

Read Write Edit Bash

Preview

**Audience:** Data engineers validating pandas DataFrames.

Supporting Assets

scripts/schemas.py

SKILL.md

Similar Skills

data-validation

Provides Python data validation functions and pipelines for DataFrames using custom checks, Pydantic, Pandera, and Great Expectations. Includes schema evolution and pytest assertions.

1 file1 tool

majestic-data

schema-validator

2.0k

Provides step-by-step guidance, code, and configurations for schema validation in data pipelines covering ETL, Airflow, Spark, streaming, and data processing. Activates on schema validator mentions.

5 tools

jeremylongshore-claude-code-plugins-plus-skills

data-validator

Validates data against JSON schemas, business rules, and quality standards including duplicates, anomalies, formats. Generates reports with errors, stats, scores, and fix suggestions.

devkit

Stats

Parent Repo Stars33

Parent Repo Forks7

Last CommitJan 19, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Pandera Validation

Audience: Data engineers validating pandas DataFrames.

Goal: Provide pandera patterns for schema validation and type checking.

Scripts

Execute schema functions from scripts/schemas.py:

from scripts.schemas import (
    create_user_schema,
    create_nullable_schema,
    create_date_range_schema,
    UserSchema,
    validate_with_errors,
    infer_and_export_schema
)

Usage Examples

Basic Schema Validation

from scripts.schemas import create_user_schema

schema = create_user_schema()
validated_df = schema.validate(df)

Collect All Errors

from scripts.schemas import create_user_schema, validate_with_errors

schema = create_user_schema()
validated_df, errors = validate_with_errors(df, schema)

if errors:
    for err in errors:
        print(f"{err['column']}: {err['check']} - {err['failure_case']}")

Class-Based Schema

from scripts.schemas import UserSchema

# Validate with type hints
UserSchema.validate(df)

# Use as function type hint
def process_users(df: pa.typing.DataFrame[UserSchema]) -> pd.DataFrame:
    return df.query("status == 'active'")

Infer Schema from DataFrame

from scripts.schemas import infer_and_export_schema

schema_export = infer_and_export_schema(df)
print(schema_export['python_code'])  # Python schema definition
print(schema_export['yaml'])         # YAML schema

Built-in Checks Reference

Check Type	Example	Description
Numeric	`Check.gt(0)`, `Check.in_range(0, 100)`	Comparisons
String	`Check.str_matches(r'pattern')`	Regex match
Set membership	`Check.isin(['A', 'B'])`	Allowed values
Uniqueness	`unique=True` on Column	No duplicates
Nullable	`nullable=True` on Column	Allow nulls

Decorator-Based Validation

import pandera as pa

@pa.check_output(schema)
def load_data(path: str) -> pd.DataFrame:
    return pd.read_csv(path)

@pa.check_input(schema, "df")
def process_data(df: pd.DataFrame) -> pd.DataFrame:
    return df.assign(processed=True)

@pa.check_io(df=input_schema, out=output_schema)
def transform_data(df: pd.DataFrame) -> pd.DataFrame:
    return df.transform(...)

When to Use Pandera

Use Case	Pandera	Alternative
DataFrame validation	✓	-
Type hints for DataFrames	✓	-
ETL pipeline checks	✓	Great Expectations
Record-level validation	-	Pydantic

Dependencies

pandera>=0.18
pandas