From aigroup-workflow
Performs pandas DataFrame operations including data cleaning, aggregation, merging, and time series analysis with production-grade patterns and validation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/aigroup-workflow:pandas-proThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Expert pandas developer specializing in efficient data manipulation, analysis, and transformation workflows with production-grade performance patterns.
Expert pandas developer specializing in efficient data manipulation, analysis, and transformation workflows with production-grade performance patterns.
print(df.dtypes)
print(df.memory_usage(deep=True).sum() / 1e6, "MB")
print(df.isna().sum())
print(df.describe(include="all"))
assert result.shape[0] == expected_rows, f"Row count mismatch: {result.shape[0]}"
assert result.isna().sum().sum() == 0, "Unexpected nulls after transform"
assert set(result.columns) == expected_cols
Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| DataFrame Operations | references/dataframe-operations.md | Indexing, selection, filtering, sorting |
| Data Cleaning | references/data-cleaning.md | Missing values, duplicates, type conversion |
| Aggregation & GroupBy | references/aggregation-groupby.md | GroupBy, pivot, crosstab, aggregation |
| Merging & Joining | references/merging-joining.md | Merge, join, concat, combine strategies |
| Performance Optimization | references/performance-optimization.md | Memory usage, vectorization, chunking |
# ❌ AVOID: row-by-row iteration
for i, row in df.iterrows():
df.at[i, 'tax'] = row['price'] * 0.2
# ✅ USE: vectorized assignment
df['tax'] = df['price'] * 0.2
.copy()# ❌ AVOID: chained indexing triggers SettingWithCopyWarning
df['A']['B'] = 1
# ✅ USE: .loc[] with explicit copy when mutating a subset
subset = df.loc[df['status'] == 'active', :].copy()
subset['score'] = subset['score'].fillna(0)
summary = (
df.groupby(['region', 'category'], observed=True)
.agg(
total_sales=('revenue', 'sum'),
avg_price=('price', 'mean'),
order_count=('order_id', 'nunique'),
)
.reset_index()
)
merged = pd.merge(
left_df, right_df,
on=['customer_id', 'date'],
how='left',
validate='m:1', # asserts right key is unique
indicator=True,
)
unmatched = merged[merged['_merge'] != 'both']
print(f"Unmatched rows: {len(unmatched)}")
merged.drop(columns=['_merge'], inplace=True)
# Forward-fill then interpolate numeric gaps
df['price'] = df['price'].ffill().interpolate(method='linear')
# Fill categoricals with mode, numerics with median
for col in df.select_dtypes(include='object'):
df[col] = df[col].fillna(df[col].mode()[0])
for col in df.select_dtypes(include='number'):
df[col] = df[col].fillna(df[col].median())
daily = (
df.set_index('timestamp')
.resample('D')
.agg({'revenue': 'sum', 'sessions': 'count'})
.fillna(0)
)
pivot = df.pivot_table(
values='revenue',
index='region',
columns='product_line',
aggfunc='sum',
fill_value=0,
margins=True,
)
# Downcast numerics and convert low-cardinality strings to categorical
df['category'] = df['category'].astype('category')
df['count'] = pd.to_numeric(df['count'], downcast='integer')
df['score'] = pd.to_numeric(df['score'], downcast='float')
print(df.memory_usage(deep=True).sum() / 1e6, "MB after optimization")
.memory_usage(deep=True).copy() when modifying subsets to avoid SettingWithCopyWarning.iterrows() unless absolutely necessarydf['A']['B']) — use .loc[] or .iloc[].ix, .append() — use pd.concat())When implementing pandas solutions, provide:
npx claudepluginhub codeape-7/ai-agent-workflowgroupPerforms pandas DataFrame operations for data analysis, manipulation, and transformation. Handles data cleaning, aggregation, merging, time series, and performance optimization.
Provides pandas API patterns for DataFrame operations, data cleaning, aggregation, merging, and performance optimization. Useful for generating pandas code in data loading, manipulation, or profiling workflows.
Fast in-memory DataFrame library for datasets up to 100GB, using lazy evaluation and parallel execution. Good for ETL pipelines and pandas replacement.