From r-package-skills
Use when code loads or uses collapse (library(collapse), collapse::), performing fast grouped or weighted statistics in R, or seeking faster alternatives to dplyr aggregation
npx claudepluginhub arthurgailes/r-package-skills --plugin r-package-skillsThis skill uses the workspace's default tool permissions.
**collapse provides C/C++-based high-performance grouped and weighted statistics.** 50-100x faster than dplyr for grouped operations, matches data.table speed while working with any data frame type (tibbles, data.tables, xts).
Modern tidyverse patterns for R including pipes, joins, grouping, purrr, and stringr. Use when writing tidyverse R code.
Use when code loads or uses duckplyr (library(duckplyr), duckplyr::), processing large datasets with dplyr syntax, working with Parquet files in R, or needing lazy evaluation for bigger-than-memory data
Provides guidance and generates production-ready code for aggregation helper tasks in data analytics, including SQL queries, data visualization, statistical analysis, and BI. Activates on 'aggregation helper' mentions.
Share bugs, ideas, or general feedback.
collapse provides C/C++-based high-performance grouped and weighted statistics. 50-100x faster than dplyr for grouped operations, matches data.table speed while working with any data frame type (tibbles, data.tables, xts).
Core principle: Fast aggregation, transformation, and panel data operations through vectorized C code.
Read references/API.md before writing code.
references/API.md - Complete function referencereferences/collapse-for-tidyverse-users.md - Migration guide and patternsreferences/collapse-documentation.md - Core concepts and usagereferences/collapse-and-sf.md - Working with spatial datareferences/collapse-object-handling.md - Data structure handlingUse collapse when:
Don't use:
vs Alternatives:
| Scenario | Use This |
|---|---|
| Large grouped stats | collapse |
| Weighted computations | collapse |
| sf manipulation | dplyr |
| Reference semantics | data.table |
| Complex joins | data.table |
| Arbitrary group functions | dplyr |
| Task | Function/Example |
|---|---|
| Grouped stats | fmean(), fsum(), fsd(), fmedian() |
| Aggregation | collap(df, ~ by, list(fmean, fsd)) |
| Transform | ftransform(), fmutate() |
| Selection | fselect(), fsubset() (~100x faster) |
| Time series | flag(), fdiff(), fgrowth() |
| Panel data | fwithin(), fbetween(), qsu() |
| Grouping | fgroup_by(), GRP() |
library(collapse)
# Basic: grouped mean (50-100x faster than dplyr)
data |> fgroup_by(category) |> fmean()
# Weighted aggregation
data |> fgroup_by(region) |> fmean(w = weight_col)
# Multiple stats at once
collap(data, ~ category, list(fmean, fsd, fmedian))
# TRA transformations (key differentiator - single C pass)
data |> fgroup_by(id) |> fmean(TRA = "-") # Demean: subtract group mean
data |> fgroup_by(id) |> fsd(TRA = "/") # Scale: divide by group SD
data |> fgroup_by(id) |> fmean(TRA = "fill") # Fill: replace NA with group mean
# See references/API.md for full TRA options ("-", "/", "fill", "-+", "replace")
| Mistake | Fix |
|---|---|
Using group_by() with collapse functions | Use fgroup_by() or pass g = GRP(groupvar) |
collap() applies to ALL numeric columns | Explicitly select columns before calling |
Expecting na.rm = FALSE default | collapse defaults to na.rm = TRUE |
fwithin()/fbetween() collapse rows | They return same # rows (centered/group means) |
| Global options affect behavior | Set arguments explicitly in package code |
Ignoring sort = FALSE speedup | Add sort = FALSE when order doesn't matter (3x faster) |
See references/ for API reference, vignette content (tidyverse comparison, sf integration, object handling, development guidelines), and panel data patterns.
Validator: lib/r-validators/numerical-validator.R
Resources: Docs