From r-skills
R performance best practices including profiling, benchmarking, vctrs, and optimization strategies. Use when optimizing R code.
npx claudepluginhub ab604/claude-code-r-skills --plugin r-skillsThis skill uses the workspace's default tool permissions.
*Profiling, benchmarking, and optimization strategies for R code*
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Guides agent creation for Claude Code plugins with file templates, frontmatter specs (name, description, model), triggering examples, system prompts, and best practices.
Profiling, benchmarking, and optimization strategies for R code
| Tool | Use When | Don't Use When | What It Shows |
|---|---|---|---|
profvis | Complex code, unknown bottlenecks | Simple functions, known issues | Time per line, call stack |
bench::mark() | Comparing alternatives | Single approach | Relative performance, memory |
system.time() | Quick checks | Detailed analysis | Total runtime only |
Rprof() | Base R only environments | When profvis available | Raw profiling data |
# 1. Profile first - find the actual bottlenecks
library(profvis)
profvis({
# Your slow code here
})
# 2. Focus on the slowest parts (80/20 rule)
# Don't optimize until you know where time is spent
# 3. Benchmark alternatives for hot spots
library(bench)
bench::mark(
current = current_approach(data),
vectorized = vectorized_approach(data),
parallel = map(data, in_parallel(func))
)
# 4. Consider tool trade-offs based on bottleneck type
in_parallel())# Helps when:
# - CPU-intensive computations
# - Embarassingly parallel problems
# - Large datasets with independent operations
# - I/O bound operations (file reading, API calls)
# Hurts when:
# - Simple, fast operations (overhead > benefit)
# - Memory-intensive operations (may cause thrashing)
# - Operations requiring shared state
# - Small datasets
# Example decision point:
expensive_func <- function(x) Sys.sleep(0.1) # 100ms per call
fast_func <- function(x) x^2 # microseconds per call
# Good for parallel
map(1:100, in_parallel(expensive_func)) # ~10s -> ~2.5s on 4 cores
# Bad for parallel (overhead > benefit)
map(1:100, in_parallel(fast_func)) # 100us -> 50ms (500x slower!)
# Use vctrs when:
# - Type safety matters more than raw speed
# - Building reusable package functions
# - Complex coercion/combination logic
# - Consistent behavior across edge cases
# Avoid vctrs when:
# - One-off scripts where speed matters most
# - Simple operations where base R is sufficient
# - Memory is extremely constrained
# Decision point:
simple_combine <- function(x, y) c(x, y) # Fast, simple
robust_combine <- function(x, y) vec_c(x, y) # Safer, slight overhead
# Use simple for hot loops, robust for package APIs
# Use data.table when:
# - Very large datasets (>1GB)
# - Complex grouping operations
# - Reference semantics desired
# - Maximum performance critical
# Use dplyr when:
# - Readability and maintainability priority
# - Complex joins and window functions
# - Team familiarity with tidyverse
# - Moderate sized data (<100MB)
# Use base R when:
# - No dependencies allowed
# - Simple operations
# - Teaching/learning contexts
# 1. Profile realistic data sizes
profvis({
# Use actual data size, not toy examples
real_data |> your_analysis()
})
# 2. Profile multiple runs for stability
bench::mark(
your_function(data),
min_iterations = 10, # Multiple runs
max_iterations = 100
)
# 3. Check memory usage too
bench::mark(
approach1 = method1(data),
approach2 = method2(data),
check = FALSE, # If outputs differ slightly
filter_gc = FALSE # Include GC time
)
# 4. Profile with realistic usage patterns
# Not just isolated function calls
# Don't optimize without measuring
# BAD: "This looks slow" -> immediately rewrite
# GOOD: Profile first, optimize bottlenecks
# Don't over-engineer for performance
# BAD: Complex optimizations for 1% gains
# GOOD: Focus on algorithmic improvements
# Don't assume - measure
# BAD: "for loops are always slow in R"
# GOOD: Benchmark your specific use case
# Don't ignore readability costs
# BAD: Unreadable code for minor speedups
# GOOD: Readable code with targeted optimizations
# For packages - consider backend tools
# vctrs for type-stable vector operations
# rlang for metaprogramming
# data.table for large data operations
# Good - vctrs-based vector class
new_percent <- function(x = double()) {
vec_assert(x, double())
new_vctr(x, class = "pkg_percent")
}
# Automatic data frame compatibility, subsetting, etc.
# Good - Guaranteed output type
my_function <- function(x, y) {
# Always returns double, regardless of input values
vec_cast(result, double())
}
# Avoid - Type depends on data
sapply(x, function(i) if(condition) 1L else 1.0)
# Good - Explicit casting with clear rules
vec_cast(x, double()) # Clear intent, predictable behavior
# Good - Common type finding
vec_ptype_common(x, y, z) # Finds richest compatible type
# Avoid - Base R inconsistencies
c(factor("a"), "b") # Unpredictable behavior
# Good - Predictable sizing
vec_c(x, y) # size = vec_size(x) + vec_size(y)
vec_rbind(df1, df2) # size = sum of input sizes
# Avoid - Unpredictable sizing
c(env_object, function_object) # Unpredictable length
| Use Case | Base R | vctrs | When to Choose vctrs |
|---|---|---|---|
| Simple combining | c() | vec_c() | Need type stability, consistent rules |
| Custom classes | S3 manually | new_vctr() | Want data frame compatibility, subsetting |
| Type conversion | as.*() | vec_cast() | Need explicit, safe casting |
| Finding common type | Not available | vec_ptype_common() | Combining heterogeneous inputs |
| Size operations | length() | vec_size() | Working with non-vector objects |
# Constructor (low-level)
new_percent <- function(x = double()) {
vec_assert(x, double())
new_vctr(x, class = "pkg_percent")
}
# Helper (user-facing)
percent <- function(x = double()) {
x <- vec_cast(x, double())
new_percent(x)
}
# Format method
format.pkg_percent <- function(x, ...) {
paste0(vec_data(x) * 100, "%")
}
# Self-coercion
vec_ptype2.pkg_percent.pkg_percent <- function(x, y, ...) {
new_percent()
}
# With double
vec_ptype2.pkg_percent.double <- function(x, y, ...) double()
vec_ptype2.double.pkg_percent <- function(x, y, ...) double()
# Casting
vec_cast.pkg_percent.double <- function(x, to, ...) {
new_percent(x)
}
vec_cast.double.pkg_percent <- function(x, to, ...) {
vec_data(x)
}
vec_c(1, 2) vs c(1, 2) for basic atomic vectors# DESCRIPTION - Import specific functions
Imports: vctrs
# NAMESPACE - Import what you need
importFrom(vctrs, vec_assert, new_vctr, vec_cast, vec_ptype_common)
# Or if using extensively
import(vctrs)
# Test type stability
test_that("my_function is type stable", {
expect_equal(vec_ptype(my_function(1:3)), vec_ptype(double()))
expect_equal(vec_ptype(my_function(integer())), vec_ptype(double()))
})
# Test coercion
test_that("coercion works", {
expect_equal(vec_ptype_common(new_percent(), 1.0), double())
expect_error(vec_ptype_common(new_percent(), "a"))
})
The key insight: vctrs is most valuable in package development where type safety, consistency, and extensibility matter more than raw speed for simple operations.
# Old -> New performance patterns
for loops for parallelizable work -> map(data, in_parallel(f))
Manual type checking -> vec_assert() / vec_cast()
Inconsistent coercion -> vec_ptype_common() / vec_c()