From claude-mods
Modern R operations for data analysis, statistics, and reproducible work. Use for: R, Rstats, tidyverse, dplyr, tidyr, ggplot2, the native pipe |>, tibbles, data wrangling (filter/mutate/summarise/group_by/across/joins/pivot), reading and writing data (readr, readxl, arrow/Parquet, DBI/dbplyr databases, data.table::fread, rvest scraping), strings (stringr) and regex, dates/times (lubridate), factors (forcats), iteration and functional programming (purrr map family, list-columns), statistics and modeling (t.test/lm/glm, formulas, broom, tidymodels), high-performance data.table, time series (tsibble/fable, zoo/xts), and project workflow (renv, Quarto, here, testthat, styler, RStudio/Posit Projects). Covers tidyverse-first idioms with base R and data.table as named alternatives.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-mods:r-opsWhen to use
Use for any R / Rstats work — tidyverse data wrangling, statistics, ggplot2 visualization, or reproducible analysis — e.g. 'rewrite this in dplyr/tidyverse', 'how do I pivot/join/group these', 'plot this with ggplot2', 'fit a model and tidy it with broom', 'speed this up with data.table', 'set up an renv + Quarto project'. Leads with modern idioms (native |>, .by=, across, purrr map); names base R / data.table where they win.
This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
A tidyverse-first, current-best-practice reference for working in R (2024+): data analysis, statistics, visualization, and reproducible workflow. Opinionated where the community has converged, with base R and `data.table` flagged as the right tool when they are.
assets/r-packages.jsonreferences/data-table.mdreferences/import-io.mdreferences/iteration-functional.mdreferences/modeling-stats.mdreferences/strings-dates-factors.mdreferences/tidyverse-core.mdreferences/time-series.mdreferences/visualization.mdreferences/workflow-tooling.mdscripts/check-r-facts.pytests/run.shA tidyverse-first, current-best-practice reference for working in R (2024+): data analysis, statistics, visualization, and reproducible workflow. Opinionated where the community has converged, with base R and data.table flagged as the right tool when they are.
| Job | Reach for | Not (anymore) |
|---|---|---|
| Pipe | native |> (R 4.1+) | %>% only when you need its placeholder/. features |
| Data frame | tibble | data.frame defaults (but it's fine) |
| Wrangle | dplyr + tidyr | hand-rolled [, subset, aggregate |
| Read CSV | readr::read_csv (prod), data.table::fread (speed) | read.csv |
| Excel / Parquet / DB | readxl / arrow / DBI+dbplyr | — |
| Strings / dates / factors | stringr / lubridate / forcats | base grepl/POSIXlt/factor juggling |
| Plot | ggplot2 | base graphics (fine for throwaway plots) |
| Iterate | purrr::map_* + across() | sapply (type-unstable); lapply ok in package code |
| Big / fast | data.table (or dtplyr, arrow+duckdb) | — |
| Model | base lm/glm + broom; tidymodels for CV/tuning | caret |
| Time series | tsibble + fable | forecast::auto.arima (maintenance-only) |
| Reports | Quarto (.qmd) | R Markdown (still works) |
| Reproducibility | renv + Projects + here() | setwd(), saving .RData |
import → tidy → transform → visualize → model → communicate
Open the reference for the task at hand — they load on demand. For broad orientation, this file is enough.
library(tidyverse)
# The native pipe threads a value into the first argument.
diamonds |>
filter(carat > 0.5) |>
mutate(price_per_carat = price / carat) |>
summarise(
mean_ppc = mean(price_per_carat),
n = n(),
.by = cut # per-operation grouping (dplyr 1.1+)
) |>
arrange(desc(mean_ppc))
# across() applies one op to many columns
df |> summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE)))
# map over a list/vector, type-stable; combine results
files |> map(read_csv) |> list_rbind(names_to = "source")
# ggplot: data + aesthetic mapping + layered geoms
ggplot(df, aes(x = displ, y = hwy, colour = class)) +
geom_point() +
geom_smooth(method = "lm")
Grouping: prefer per-operation .by = over group_by() |> ... |> ungroup() — it avoids sticky-group bugs.
Joins: always write join_by(...) explicitly. Natural joins on shared names are almost always wrong on real data.
Which CSV reader? read_csv (readable, good defaults, production) · fread (fastest, big files) · vroom (many files, column subset).
dplyr or data.table? dplyr for readability and teams; data.table (or dtplyr) when profiling says dplyr is the bottleneck or data is large. arrow+duckdb for larger-than-memory.
lm or tidymodels? Base lm/glm is the right default — reach for tidymodels only when you need cross-validation, tuning, or uniform multi-model comparison.
base R or tidyverse? Tidyverse for analysis, readability, teams. Base R (or data.table) for package development, minimal-dependency scripts, and performance-critical inner loops. The |> pipe is base and dependency-free — use it everywhere.
These bite people repeatedly — full detail in the referenced files:
stringsAsFactors is FALSE since R 4.0 (2020). Old advice warning about automatic factor conversion on import is stale and sometimes backwards. (import-io)predict(glm_model, type = "response") for probabilities — the default returns link-scale (log-odds). (modeling-stats)cor.test(), not cor() when you care whether a correlation is real. (modeling-stats)sapply is type-unstable — never in function bodies; use a typed map_*. (iteration-functional)map_dfr/map_dfc are superseded → map() |> list_rbind() / list_cbind(). (iteration-functional)aes(colour = class) maps a variable; colour = "blue" sets a constant. Putting a constant inside aes() is the #1 ggplot mistake. (visualization)coord_cartesian(ylim=) zooms; scale_y_continuous(limits=) drops data — the latter silently corrupts smooths/boxplots. (visualization)fct_reorder for plots, fct_relevel for models. (strings-dates-factors)months(1) (calendar) vs dmonths(1) (fixed seconds); use %m+% for safe month-end arithmetic. (strings-dates-factors)data.table := mutates in place — DT2 <- DT is not a copy; use copy(DT). (data-table)lag(k = +1) leads (future data); use k = -1. rollapply defaults to center alignment — set align = "right" to avoid look-ahead bias. (time-series)setwd() with an absolute path — use an RStudio Project + here::here(). Don't save/restore .RData. (workflow-tooling)Reflects the R ecosystem as of 2024–2026: R ≥ 4.3, tidyverse 2.0, native |>, dplyr .by=, the \(x) lambda, list_rbind/list_cbind, the tidyverts (tsibble/fable) time-series stack, and Quarto. Where a once-standard approach has been superseded (base apply → purrr, forecast → fable, R Markdown → Quarto, map_dfr → list_rbind), the modern form leads and the older one is noted for when you encounter it in the wild.
This currency is verified, not asserted — scripts/check-r-facts.py guards it against silent drift:
# Structural (PR CI, no network): every CRAN package in the catalog is still
# named in this skill's prose, and the currency note still carries a year.
python scripts/check-r-facts.py --offline # exit 0 consistent, 10 drift
# Live (weekly freshness job, never blocks a PR): every recommended package
# still resolves on CRAN.
python scripts/check-r-facts.py --live # exit 10 a package is gone, 7 CRAN unreachable
The canonical package list lives in assets/r-packages.json; when you add or drop a recommendation, update it to match or --offline fails CI.
npx claudepluginhub 0xdarkmatter/claude-mods --plugin claude-modsModern tidyverse patterns for R including pipes, joins, grouping, purrr, and stringr. Use when writing tidyverse R code.
Use when code loads or uses btw (library(btw), btw::), providing R session context to LLMs, registering tools for ellmer chat, or copying R object descriptions to clipboard