Skill

r-ops

Modern R operations for data analysis, statistics, and reproducible work. Use for: R, Rstats, tidyverse, dplyr, tidyr, ggplot2, the native pipe |>, tibbles, data wrangling (filter/mutate/summarise/group_by/across/joins/pivot), reading and writing data (readr, readxl, arrow/Parquet, DBI/dbplyr databases, data.table::fread, rvest scraping), strings (stringr) and regex, dates/times (lubridate), factors (forcats), iteration and functional programming (purrr map family, list-columns), statistics and modeling (t.test/lm/glm, formulas, broom, tidymodels), high-performance data.table, time series (tsibble/fable, zoo/xts), and project workflow (renv, Quarto, here, testthat, styler, RStudio/Posit Projects). Covers tidyverse-first idioms with base R and data.table as named alternatives.

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-mods:r-ops

User invocable

Model invocable

Inline context

Default effort

When to use

Use for any R / Rstats work — tidyverse data wrangling, statistics, ggplot2 visualization, or reproducible analysis — e.g. 'rewrite this in dplyr/tidyverse', 'how do I pivot/join/group these', 'plot this with ggplot2', 'fit a model and tidy it with broom', 'speed this up with data.table', 'set up an renv + Quarto project'. Leads with modern idioms (native |>, .by=, across, purrr map); names base R / data.table where they win.

Tool Access

This skill is limited to the following tools:

ReadWriteBash

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A tidyverse-first, current-best-practice reference for working in R (2024+): data analysis, statistics, visualization, and reproducible workflow. Opinionated where the community has converged, with base R and `data.table` flagged as the right tool when they are.

Supporting Files

assets/r-packages.jsonreferences/data-table.mdreferences/import-io.mdreferences/iteration-functional.mdreferences/modeling-stats.mdreferences/strings-dates-factors.mdreferences/tidyverse-core.mdreferences/time-series.mdreferences/visualization.mdreferences/workflow-tooling.mdscripts/check-r-facts.pytests/run.sh

SKILL.md

129 lines · ~2.2k tokens

Stats

LanguageShell

Stars24

Forks5

MaintenanceExcellent

Last CommitJun 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Modern R Operations

A tidyverse-first, current-best-practice reference for working in R (2024+): data analysis, statistics, visualization, and reproducible workflow. Opinionated where the community has converged, with base R and data.table flagged as the right tool when they are.

The modern R stack at a glance

Job	Reach for	Not (anymore)
Pipe	native `\|>` (R 4.1+)	`%>%` only when you need its placeholder/`.` features
Data frame	`tibble`	`data.frame` defaults (but it's fine)
Wrangle	`dplyr` + `tidyr`	hand-rolled `[`, `subset`, `aggregate`
Read CSV	`readr::read_csv` (prod), `data.table::fread` (speed)	`read.csv`
Excel / Parquet / DB	`readxl` / `arrow` / `DBI`+`dbplyr`	—
Strings / dates / factors	`stringr` / `lubridate` / `forcats`	base `grepl`/`POSIXlt`/`factor` juggling
Plot	`ggplot2`	base graphics (fine for throwaway plots)
Iterate	`purrr::map_*` + `across()`	`sapply` (type-unstable); `lapply` ok in package code
Big / fast	`data.table` (or `dtplyr`, `arrow`+`duckdb`)	—
Model	base `lm`/`glm` + `broom`; `tidymodels` for CV/tuning	`caret`
Time series	`tsibble` + `fable`	`forecast::auto.arima` (maintenance-only)
Reports	Quarto (`.qmd`)	R Markdown (still works)
Reproducibility	`renv` + Projects + `here()`	`setwd()`, saving `.RData`

The analysis workflow (and where each reference lives)

import → tidy → transform → visualize → model → communicate

Import — get data in: import-io.md
Tidy & transform — the dplyr/tidyr core: tidyverse-core.md
Clean types — strings, dates, factors: strings-dates-factors.md
Iterate — map over many things, list-columns: iteration-functional.md
Visualize — ggplot2 + EDA: visualization.md
Model — tests, lm/glm, broom, tidymodels: modeling-stats.md
Scale up — when dplyr is too slow: data-table.md
Time series — tsibble/fable, xts: time-series.md
Ship it — projects, renv, Quarto, testing: workflow-tooling.md

Open the reference for the task at hand — they load on demand. For broad orientation, this file is enough.

Core idioms (internalize these)

library(tidyverse)

# The native pipe threads a value into the first argument.
diamonds |>
  filter(carat > 0.5) |>
  mutate(price_per_carat = price / carat) |>
  summarise(
    mean_ppc = mean(price_per_carat),
    n = n(),
    .by = cut                      # per-operation grouping (dplyr 1.1+)
  ) |>
  arrange(desc(mean_ppc))

# across() applies one op to many columns
df |> summarise(across(where(is.numeric), \(x) mean(x, na.rm = TRUE)))

# map over a list/vector, type-stable; combine results
files |> map(read_csv) |> list_rbind(names_to = "source")

# ggplot: data + aesthetic mapping + layered geoms
ggplot(df, aes(x = displ, y = hwy, colour = class)) +
  geom_point() +
  geom_smooth(method = "lm")

Decision shortcuts

Grouping: prefer per-operation .by = over group_by() |> ... |> ungroup() — it avoids sticky-group bugs.

Joins: always write join_by(...) explicitly. Natural joins on shared names are almost always wrong on real data.

Which CSV reader? read_csv (readable, good defaults, production) · fread (fastest, big files) · vroom (many files, column subset).

dplyr or data.table? dplyr for readability and teams; data.table (or dtplyr) when profiling says dplyr is the bottleneck or data is large. arrow+duckdb for larger-than-memory.

lm or tidymodels? Base lm/glm is the right default — reach for tidymodels only when you need cross-validation, tuning, or uniform multi-model comparison.

base R or tidyverse? Tidyverse for analysis, readability, teams. Base R (or data.table) for package development, minimal-dependency scripts, and performance-critical inner loops. The |> pipe is base and dependency-free — use it everywhere.

High-value gotchas

These bite people repeatedly — full detail in the referenced files:

stringsAsFactors is FALSE since R 4.0 (2020). Old advice warning about automatic factor conversion on import is stale and sometimes backwards. (import-io)
predict(glm_model, type = "response") for probabilities — the default returns link-scale (log-odds). (modeling-stats)
cor.test(), not cor() when you care whether a correlation is real. (modeling-stats)
sapply is type-unstable — never in function bodies; use a typed map_*. (iteration-functional)
map_dfr/map_dfc are superseded → map() |> list_rbind() / list_cbind(). (iteration-functional)
ggplot mapping vs setting: aes(colour = class) maps a variable; colour = "blue" sets a constant. Putting a constant inside aes() is the #1 ggplot mistake. (visualization)
coord_cartesian(ylim=) zooms; scale_y_continuous(limits=) drops data — the latter silently corrupts smooths/boxplots. (visualization)
Factor order is not cosmetic — it sets ggplot axis/legend order and regression reference levels. fct_reorder for plots, fct_relevel for models. (strings-dates-factors)
lubridate periods vs durations: months(1) (calendar) vs dmonths(1) (fixed seconds); use %m+% for safe month-end arithmetic. (strings-dates-factors)
data.table := mutates in place — DT2 <- DT is not a copy; use copy(DT). (data-table)
xts lag(k = +1) leads (future data); use k = -1. rollapply defaults to center alignment — set align = "right" to avoid look-ahead bias. (time-series)
Never setwd() with an absolute path — use an RStudio Project + here::here(). Don't save/restore .RData. (workflow-tooling)

Currency note

Reflects the R ecosystem as of 2024–2026: R ≥ 4.3, tidyverse 2.0, native |>, dplyr .by=, the \(x) lambda, list_rbind/list_cbind, the tidyverts (tsibble/fable) time-series stack, and Quarto. Where a once-standard approach has been superseded (base apply → purrr, forecast → fable, R Markdown → Quarto, map_dfr → list_rbind), the modern form leads and the older one is noted for when you encounter it in the wild.

This currency is verified, not asserted — scripts/check-r-facts.py guards it against silent drift:

# Structural (PR CI, no network): every CRAN package in the catalog is still
# named in this skill's prose, and the currency note still carries a year.
python scripts/check-r-facts.py --offline        # exit 0 consistent, 10 drift

# Live (weekly freshness job, never blocks a PR): every recommended package
# still resolves on CRAN.
python scripts/check-r-facts.py --live            # exit 10 a package is gone, 7 CRAN unreachable

The canonical package list lives in assets/r-packages.json; when you add or drop a recommendation, update it to match or --offline fails CI.

r-ops

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

r-ops

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Modern R Operations

The modern R stack at a glance

The analysis workflow (and where each reference lives)

Core idioms (internalize these)

Decision shortcuts

High-value gotchas

Currency note

Similar Skills

Modern R Operations

The modern R stack at a glance

The analysis workflow (and where each reference lives)

Core idioms (internalize these)

Decision shortcuts

High-value gotchas

Currency note

Similar Skills