From majestic-data
Generates data profiles for pandas DataFrames with column stats, correlations, and missing patterns. Use for EDA and data discovery on new datasets.
npx claudepluginhub majesticlabs-dev/majestic-marketplace --plugin majestic-dataThis skill is limited to using the following tools:
**Audience:** Data engineers and analysts exploring new datasets.
Deep-profiles active datasets for schema structure, value distributions, temporal patterns, correlations, completeness gaps, and anomalies. Use after connecting datasets or before analysis.
Profiles tables or files (CSV, Excel, Parquet, JSON) to reveal shape, null rates, column distributions, top values, percentiles, data quality issues, and column categories.
Profiles unfamiliar datasets: schema structure, column distributions, data quality, null rates, cardinality, outliers, table relationships, temporal coverage. Use for onboarding, auditing freshness, discovering foreign keys.
Share bugs, ideas, or general feedback.
Audience: Data engineers and analysts exploring new datasets.
Goal: Generate comprehensive profiles including statistics, correlations, and missing patterns.
Execute profiling functions from scripts/profiling.py:
from scripts.profiling import (
profile_dataframe,
print_profile_summary,
profile_correlations,
profile_missing_patterns
)
import pandas as pd
from scripts.profiling import profile_dataframe, print_profile_summary
df = pd.read_csv('data.csv')
profile = profile_dataframe(df)
print_profile_summary(profile)
Output:
Shape: 10,000 rows x 15 columns
Memory: 1.23 MB
Column Summary:
id (int64): 10,000 unique, no nulls
email (object): 9,847 unique, 1.53% null
revenue (float64): 3,421 unique, no nulls
created_at (datetime64[ns]): 365 unique, no nulls
from scripts.profiling import profile_correlations
corr = profile_correlations(df, threshold=0.7)
if corr['high_correlations']:
print("Highly correlated columns:")
for c in corr['high_correlations']:
print(f" {c['col1']} <-> {c['col2']}: {c['correlation']}")
from scripts.profiling import profile_missing_patterns
missing = profile_missing_patterns(df)
for col, stats in missing.items():
if col != 'co_missing_columns':
print(f"{col}: {stats['percent']}% missing, max {stats['consecutive_max']} consecutive")
# Check for columns missing together
if 'co_missing_columns' in missing:
for col1, col2, pct in missing['co_missing_columns']:
print(f"{col1} and {col2} both missing {pct}% of time")
shape: [rows, columns]
memory_mb: float
columns:
column_name:
dtype: string
null_count: int
null_pct: float
unique_count: int
unique_pct: float
# Numeric columns add:
min: float
max: float
mean: float
std: float
median: float
zeros: int
negatives: int
# String columns add:
min_length: int
max_length: int
top_values: {value: count}
# Datetime columns add:
min_date: string
max_date: string
date_range_days: int
correlation_matrix = df.select_dtypes(include=[np.number]).corr()
# Highly correlated pairs (> 0.8)
high_corr = []
for i in range(len(correlation_matrix.columns)):
for j in range(i+1, len(correlation_matrix.columns)):
if abs(correlation_matrix.iloc[i, j]) > 0.8:
high_corr.append((
correlation_matrix.columns[i],
correlation_matrix.columns[j],
correlation_matrix.iloc[i, j]
))
Automatically flag:
pandas
numpy