Work with labeled multidimensional arrays for scientific data analysis using Xarray. Use when handling climate data, satellite imagery, oceanographic data, or any multidimensional datasets with coordinates and metadata. Ideal for NetCDF/HDF5 files, time series analysis, and large datasets requiring lazy loading with Dask.
/plugin marketplace add uw-ssec/rse-agents/plugin install python-development@rse-agentsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/COMMON_ISSUES.mdreferences/EXAMPLES.mdreferences/PATTERNS.mdMaster Xarray, the powerful library for working with labeled multidimensional arrays in scientific Python. Learn how to efficiently handle complex datasets with multiple dimensions, coordinates, and metadata - from climate data and satellite imagery to experimental measurements and simulations.
Official Documentation: https://docs.xarray.dev/
GitHub: https://github.com/pydata/xarray
# Using pixi (recommended for scientific projects)
pixi add xarray netcdf4 dask
# Using pip
pip install xarray[complete]
# Optional dependencies for specific formats
pixi add zarr h5netcdf scipy bottleneck
# Geospatial extensions (for raster data, CRS handling, reprojection)
pixi add rioxarray xesmf
# DataTree is built into Xarray (no separate installation needed)
import xarray as xr
import numpy as np
# DataArray: Single labeled array
temperature = xr.DataArray(
data=np.random.randn(3, 4),
dims=["time", "location"],
coords={
"time": ["2024-01-01", "2024-01-02", "2024-01-03"],
"location": ["A", "B", "C", "D"]
},
name="temperature"
)
# Dataset: Collection of DataArrays
ds = xr.Dataset({
"temperature": temperature,
"pressure": (["time", "location"], np.random.randn(3, 4))
})
# Selection by label
ds.sel(time="2024-01-01")
ds.sel(location="A")
# Selection by index
ds.isel(time=0)
# Slicing
ds.sel(time=slice("2024-01-01", "2024-01-02"))
# Aggregation
ds.mean(dim="time")
ds.sum(dim="location")
# Computation
ds["temperature"] + 273.15 # Celsius to Kelvin
ds.groupby("time.month").mean()
# I/O operations
ds.to_netcdf("data.nc")
ds = xr.open_dataset("data.nc")
Working with multidimensional scientific data?
├─ YES → Use Xarray for labeled dimensions
└─ NO → NumPy/Pandas sufficient
Need to track coordinates and metadata?
├─ YES → Xarray keeps everything aligned
└─ NO → Plain NumPy arrays work
Working with geospatial raster data?
├─ YES → Use rioxarray for CRS-aware operations
└─ NO → Standard Xarray sufficient
Data has natural hierarchical structure?
├─ YES → Use DataTree for organization
└─ NO → Dataset/DataArray sufficient
Data too large for memory?
├─ YES → Use Xarray with Dask backend
└─ NO → Standard Xarray is fine
Need to save/load scientific data formats?
├─ NetCDF/HDF5 → Xarray native support
├─ Zarr → Use Xarray with zarr backend
└─ CSV/Excel → Pandas then convert to Xarray
Working with time series data?
├─ Multi-dimensional → Xarray
└─ Tabular → Pandas
Need to align data from different sources?
├─ YES → Xarray handles alignment automatically
└─ NO → Manual alignment with NumPy
Use Xarray when working with:
A DataArray is Xarray's fundamental data structure - think of it as a NumPy array with labels and metadata.
Anatomy of a DataArray:
import xarray as xr
import numpy as np
# Create a DataArray
temperature = xr.DataArray(
data=np.array([[15.2, 16.1, 14.8],
[16.5, 17.2, 15.9],
[17.1, 18.0, 16.5]]),
dims=["time", "location"],
coords={
"time": pd.date_range("2024-01-01", periods=3),
"location": ["Station_A", "Station_B", "Station_C"],
"lat": ("location", [40.7, 34.0, 41.8]),
"lon": ("location", [-74.0, -118.2, -87.6])
},
attrs={
"units": "Celsius",
"description": "Daily average temperature"
}
)
Key components:
A Dataset is like a dict of DataArrays that share dimensions - similar to a Pandas DataFrame but for N-dimensional data.
Example:
# Create a Dataset
ds = xr.Dataset({
"temperature": (["time", "location"], np.random.randn(3, 4)),
"humidity": (["time", "location"], np.random.rand(3, 4) * 100),
"pressure": (["time", "location"], 1013 + np.random.randn(3, 4) * 10)
},
coords={
"time": pd.date_range("2024-01-01", periods=3),
"location": ["A", "B", "C", "D"]
})
Coordinates provide meaningful labels for array dimensions and enable label-based indexing.
Types of coordinates:
Dimension coordinates (1D, same name as dimension):
time_coord = pd.date_range("2024-01-01", periods=365)
Non-dimension coordinates (auxiliary information):
# Latitude/longitude for each station
coords = {
"time": time_coord,
"station": ["A", "B", "C"],
"lat": ("station", [40.7, 34.0, 41.8]),
"lon": ("station", [-74.0, -118.2, -87.6])
}
Xarray provides powerful label-based and position-based indexing.
Label-based selection (.sel):
# Select by coordinate value
ds.sel(time="2024-01-15")
ds.sel(location="Station_A")
# Nearest neighbor selection
ds.sel(time="2024-01-15", method="nearest")
# Range selection
ds.sel(time=slice("2024-01-01", "2024-01-31"))
Position-based selection (.isel):
# Select by integer position
ds.isel(time=0)
ds.isel(location=[0, 2])
Boolean indexing (.where):
# Keep only values meeting condition
ds.where(ds["temperature"] > 15, drop=True)
DataTree is Xarray's class for organizing hierarchical (tree-structured) data. Think of it as a filesystem for datasets, where each node can contain a dataset and child nodes.
When to use DataTree:
Creating a DataTree:
import xarray as xr
# From a dictionary of datasets
dt = xr.DataTree.from_dict({
"/": xr.Dataset({"description": "Root metadata"}),
"/observations": xr.Dataset({"temp": (["time"], [15.2, 16.1, 14.8])}),
"/observations/station_a": xr.Dataset({"location": "New York"}),
"/observations/station_b": xr.Dataset({"location": "Los Angeles"}),
"/model_outputs": xr.Dataset({"predicted_temp": (["time"], [15.0, 16.0, 15.0])})
})
# Access nodes using filesystem-like paths
print(dt["/observations/station_a"])
print(dt["observations"]["station_a"]) # Equivalent
Key DataTree operations:
# Navigate the tree
dt.parent # Get parent node
dt.children # Get child nodes dict
dt.subtree # Iterate over all descendant nodes
dt.leaves # Get all leaf nodes
# Apply operations across all datasets
dt.mean(dim="time") # Apply to all nodes
# Map custom functions
dt.map_over_datasets(lambda ds: ds + 273.15)
# Filter nodes
dt.match("*/station_*") # Pattern matching
dt.filter(lambda node: "temp" in node.ds.data_vars) # Content-based filtering
# Coordinate inheritance (child nodes inherit parent coordinates)
# Define coordinates once at parent level, accessible in all children
Combining DataTrees:
# Arithmetic operations on isomorphic trees
dt1 + dt2 # Add corresponding datasets at each node
# Check structure compatibility
dt1.isomorphic(dt2) # Returns True if same structure
Xarray has a rich ecosystem of extensions for domain-specific workflows. For geospatial data analysis, prioritize rioxarray over vanilla Xarray.
Key geospatial extensions:
rioxarray - Geospatial raster operations:
import rioxarray
# Open raster with CRS (Coordinate Reference System) awareness
ds = rioxarray.open_rasterio("satellite_image.tif")
# Reproject to different CRS
ds_reprojected = ds.rio.reproject("EPSG:4326")
# Clip to bounding box
ds_clipped = ds.rio.clip_box(minx=-120, miny=35, maxx=-115, maxy=40)
# Write with CRS metadata
ds.rio.to_raster("output.tif")
Other useful extensions:
When to use which:
See references/PATTERNS.md for detailed patterns including:
See references/EXAMPLES.md for complete examples including:
See references/COMMON_ISSUES.md for solutions to:
Xarray is the go-to library for working with labeled multidimensional arrays in scientific Python. It combines the power of NumPy arrays with the convenience of Pandas labels, making it ideal for climate data, satellite imagery, experimental measurements, and any data with multiple dimensions.
Key takeaways:
Next steps:
Xarray transforms complex multidimensional data analysis into intuitive, readable code while maintaining high performance and scalability.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.