This skill should be used when the user asks to "read NetCDF files", "work with xarray", "analyze climate data", "process satellite data", "use DataArray", "create Dataset", "work with multidimensional data", "use Dask with xarray", "read Zarr files", "work with labeled arrays", "use DataTree", "process raster data with rioxarray", or needs guidance on Xarray, NetCDF/HDF5/Zarr I/O, labeled multidimensional arrays, climate/satellite/oceanographic data analysis, Dask integration for large datasets, or geospatial raster operations.
Processes and analyzes labeled multidimensional scientific data like climate models, satellite imagery, and NetCDF files.
npx claudepluginhub uw-ssec/rse-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/COMMON_ISSUES.mdreferences/EXAMPLES.mdreferences/PATTERNS.mdMaster Xarray, the powerful library for working with labeled multidimensional arrays in scientific Python. Learn how to efficiently handle complex datasets with multiple dimensions, coordinates, and metadata - from climate data and satellite imagery to experimental measurements and simulations.
Official Documentation: https://docs.xarray.dev/
GitHub: https://github.com/pydata/xarray
# Using pixi (recommended for scientific projects)
pixi add xarray netcdf4 dask
# Using pip
pip install xarray[complete]
# Optional dependencies for specific formats
pixi add zarr h5netcdf scipy bottleneck
# Geospatial extensions (for raster data, CRS handling, reprojection)
pixi add rioxarray xesmf
# DataTree is built into Xarray (no separate installation needed)
import xarray as xr
import numpy as np
# DataArray: Single labeled array
temperature = xr.DataArray(
data=np.random.randn(3, 4),
dims=["time", "location"],
coords={
"time": ["2024-01-01", "2024-01-02", "2024-01-03"],
"location": ["A", "B", "C", "D"]
},
name="temperature"
)
# Dataset: Collection of DataArrays
ds = xr.Dataset({
"temperature": temperature,
"pressure": (["time", "location"], np.random.randn(3, 4))
})
# Selection by label
ds.sel(time="2024-01-01")
ds.sel(location="A")
# Selection by index
ds.isel(time=0)
# Slicing
ds.sel(time=slice("2024-01-01", "2024-01-02"))
# Aggregation
ds.mean(dim="time")
ds.sum(dim="location")
# Computation
ds["temperature"] + 273.15 # Celsius to Kelvin
ds.groupby("time.month").mean()
# I/O operations
ds.to_netcdf("data.nc")
ds = xr.open_dataset("data.nc")
Working with multidimensional scientific data?
├─ YES → Use Xarray for labeled dimensions
└─ NO → NumPy/Pandas sufficient
Need to track coordinates and metadata?
├─ YES → Xarray keeps everything aligned
└─ NO → Plain NumPy arrays work
Working with geospatial raster data?
├─ YES → Use rioxarray for CRS-aware operations
└─ NO → Standard Xarray sufficient
Data has natural hierarchical structure?
├─ YES → Use DataTree for organization
└─ NO → Dataset/DataArray sufficient
Data too large for memory?
├─ YES → Use Xarray with Dask backend
└─ NO → Standard Xarray is fine
Need to save/load scientific data formats?
├─ NetCDF/HDF5 → Xarray native support
├─ Zarr → Use Xarray with zarr backend
└─ CSV/Excel → Pandas then convert to Xarray
Working with time series data?
├─ Multi-dimensional → Xarray
└─ Tabular → Pandas
Need to align data from different sources?
├─ YES → Xarray handles alignment automatically
└─ NO → Manual alignment with NumPy
Use Xarray when working with:
A DataArray is Xarray's fundamental data structure - think of it as a NumPy array with labels and metadata.
Anatomy of a DataArray:
import xarray as xr
import numpy as np
# Create a DataArray
temperature = xr.DataArray(
data=np.array([[15.2, 16.1, 14.8],
[16.5, 17.2, 15.9],
[17.1, 18.0, 16.5]]),
dims=["time", "location"],
coords={
"time": pd.date_range("2024-01-01", periods=3),
"location": ["Station_A", "Station_B", "Station_C"],
"lat": ("location", [40.7, 34.0, 41.8]),
"lon": ("location", [-74.0, -118.2, -87.6])
},
attrs={
"units": "Celsius",
"description": "Daily average temperature"
}
)
Key components:
A Dataset is like a dict of DataArrays that share dimensions - similar to a Pandas DataFrame but for N-dimensional data.
Example:
# Create a Dataset
ds = xr.Dataset({
"temperature": (["time", "location"], np.random.randn(3, 4)),
"humidity": (["time", "location"], np.random.rand(3, 4) * 100),
"pressure": (["time", "location"], 1013 + np.random.randn(3, 4) * 10)
},
coords={
"time": pd.date_range("2024-01-01", periods=3),
"location": ["A", "B", "C", "D"]
})
Coordinates provide meaningful labels for array dimensions and enable label-based indexing.
Types of coordinates:
Dimension coordinates (1D, same name as dimension):
time_coord = pd.date_range("2024-01-01", periods=365)
Non-dimension coordinates (auxiliary information):
# Latitude/longitude for each station
coords = {
"time": time_coord,
"station": ["A", "B", "C"],
"lat": ("station", [40.7, 34.0, 41.8]),
"lon": ("station", [-74.0, -118.2, -87.6])
}
Xarray provides powerful label-based and position-based indexing.
Label-based selection (.sel):
# Select by coordinate value
ds.sel(time="2024-01-15")
ds.sel(location="Station_A")
# Nearest neighbor selection
ds.sel(time="2024-01-15", method="nearest")
# Range selection
ds.sel(time=slice("2024-01-01", "2024-01-31"))
Position-based selection (.isel):
# Select by integer position
ds.isel(time=0)
ds.isel(location=[0, 2])
Boolean indexing (.where):
# Keep only values meeting condition
ds.where(ds["temperature"] > 15, drop=True)
DataTree is Xarray's class for organizing hierarchical (tree-structured) data. Think of it as a filesystem for datasets, where each node can contain a dataset and child nodes.
When to use DataTree:
Creating a DataTree:
import xarray as xr
# From a dictionary of datasets
dt = xr.DataTree.from_dict({
"/": xr.Dataset({"description": "Root metadata"}),
"/observations": xr.Dataset({"temp": (["time"], [15.2, 16.1, 14.8])}),
"/observations/station_a": xr.Dataset({"location": "New York"}),
"/observations/station_b": xr.Dataset({"location": "Los Angeles"}),
"/model_outputs": xr.Dataset({"predicted_temp": (["time"], [15.0, 16.0, 15.0])})
})
# Access nodes using filesystem-like paths
print(dt["/observations/station_a"])
print(dt["observations"]["station_a"]) # Equivalent
Key DataTree operations:
# Navigate the tree
dt.parent # Get parent node
dt.children # Get child nodes dict
dt.subtree # Iterate over all descendant nodes
dt.leaves # Get all leaf nodes
# Apply operations across all datasets
dt.mean(dim="time") # Apply to all nodes
# Map custom functions
dt.map_over_datasets(lambda ds: ds + 273.15)
# Filter nodes
dt.match("*/station_*") # Pattern matching
dt.filter(lambda node: "temp" in node.ds.data_vars) # Content-based filtering
# Coordinate inheritance (child nodes inherit parent coordinates)
# Define coordinates once at parent level, accessible in all children
Combining DataTrees:
# Arithmetic operations on isomorphic trees
dt1 + dt2 # Add corresponding datasets at each node
# Check structure compatibility
dt1.isomorphic(dt2) # Returns True if same structure
Xarray has a rich ecosystem of extensions for domain-specific workflows. For geospatial data analysis, prioritize rioxarray over vanilla Xarray.
Key geospatial extensions:
rioxarray - Geospatial raster operations:
import rioxarray
# Open raster with CRS (Coordinate Reference System) awareness
ds = rioxarray.open_rasterio("satellite_image.tif")
# Reproject to different CRS
ds_reprojected = ds.rio.reproject("EPSG:4326")
# Clip to bounding box
ds_clipped = ds.rio.clip_box(minx=-120, miny=35, maxx=-115, maxy=40)
# Write with CRS metadata
ds.rio.to_raster("output.tif")
Other useful extensions:
When to use which:
See references/PATTERNS.md for detailed patterns including:
See references/EXAMPLES.md for complete examples including:
See references/COMMON_ISSUES.md for solutions to:
Xarray is the go-to library for working with labeled multidimensional arrays in scientific Python. It combines the power of NumPy arrays with the convenience of Pandas labels, making it ideal for climate data, satellite imagery, experimental measurements, and any data with multiple dimensions.
Key takeaways:
Next steps:
Xarray transforms complex multidimensional data analysis into intuitive, readable code while maintaining high performance and scalability.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.