From nutmeg
Transforms, filters, reshapes, joins, and manipulates football data for cleaning, merging datasets, format conversion, missing values handling, and large dataset processing.
npx claudepluginhub withqwerty/plugins --plugin nutmegThis skill is limited to using the following tools:
Help the user manipulate football data effectively. This skill is about the mechanics of working with data, adapted to the user's language and tools.
Entry point for football data analytics: routes user requests for xG, expected goals, player stats, match analysis, shot maps, passing networks, FBref/Understat scraping to sub-skills; handles setup.
Fetches football (soccer) data across 13 leagues: standings, schedules, match stats, xG, transfers, player profiles. CLI/Python SDK access, no API keys.
Performs pandas DataFrame operations for data analysis, manipulation, cleaning, aggregation, merging, pivoting, time series resampling, and performance optimization.
Share bugs, ideas, or general feedback.
Help the user manipulate football data effectively. This skill is about the mechanics of working with data, adapted to the user's language and tools.
Read and follow docs/accuracy-guardrail.md before answering any question about provider-specific facts (IDs, endpoints, schemas, coordinates, rate limits). Always use search_docs — never guess from training data.
Read .nutmeg.user.md. If it doesn't exist, tell the user to run /nutmeg first. Use their profile for language preference and stack.
Football data coordinates vary by provider. Always verify and convert before combining data.
Use search_docs(query="coordinate system", provider="[provider]") to look up the specific system. Key conversions:
x * 1.2, y * 0.8x stays, y = 100 - y (invert Y).transform() in PythonCommon filtering patterns for football event data:
By event type:
By match state:
By zone:
Common joins in football data:
| Join | Key | Notes |
|---|---|---|
| Events + lineups | player_id + match_id | Get player names/positions for each event |
| Events + xG | match_id + event sequence | Match xG to specific shots |
| Multiple providers | match date + team names | Fuzzy matching often needed |
| Season data + Elo | date | Join Elo rating at time of match |
Fuzzy team name matching is a constant pain. Build a mapping table:
TEAM_MAP = {
'Man City': 'Manchester City',
'Man United': 'Manchester United',
'Spurs': 'Tottenham Hotspur',
'Wolves': 'Wolverhampton Wanderers',
# ...
}
Common reshaping operations:
Full event data for a PL season is ~500MB+ (380 matches x ~1700 events). Strategies:
Python:
JavaScript/TypeScript:
readline or JSONStreamR:
Always validate after wrangling:
| Check | What to look for |
|---|---|
| Event counts | ~1500-2000 events per PL match. Much less = data issue |
| Coordinate range | Should be within provider's expected range |
| Missing player IDs | Some events lack player attribution (ball out, etc.) |
| Duplicate events | Same event_id appearing twice |
| Time gaps | Large gaps in event timestamps within a match |
| Team attribution | Verify home/away assignment is consistent |
| From | To | Tool/method |
|---|---|---|
| JSON events | DataFrame | pandas/polars read_json or manual parsing |
| CSV | Parquet | df.write_parquet() (polars) or df.to_parquet() (pandas) |
| Provider format | kloppy model | kloppy.load_{provider}() in Python |
| kloppy model | DataFrame | dataset.to_df() |
| Any | SQLite | Load into SQLite for ad-hoc queries |