Help us improve
Share bugs, ideas, or general feedback.
From zarr-data-format
Configures Zarr stores on S3, GCS, Azure Blob using fsspec (s3fs/gcsfs/adlfs), obstore, Icechunk. Covers authentication, caching, performance tuning for remote I/O.
npx claudepluginhub uw-ssec/rse-plugins --plugin zarr-data-formatHow this skill is triggered — by the user, by Claude, or both
Slash command
/zarr-data-format:cloud-storage-backendsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Configure Zarr to read and write arrays on cloud object storage — **Amazon S3**, **Google Cloud Storage (GCS)**, and **Azure Blob Storage**. This skill covers the full range of backend options: **fsspec**-based stores (s3fs, gcsfs, adlfs), the high-performance Rust-based **obstore** library, and **Icechunk** for versioned, transactional cloud storage. You will learn authentication patterns, cac...
Chunked, compressed N-dimensional arrays for cloud storage with Zarr — parallel I/O, S3/GCS integration, and NumPy/Dask/Xarray compatibility.
Provides chunked N-D arrays for cloud storage with compression, parallel I/O, S3/GCS integration, and NumPy/Dask/Xarray compatibility for large-scale scientific computing.
Stores large N-dimensional arrays with chunking, compression, and pluggable storage (local, S3, GCS, ZIP, memory). Supports out-of-core computation and Dask/Xarray integration.
Share bugs, ideas, or general feedback.
Configure Zarr to read and write arrays on cloud object storage — Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage. This skill covers the full range of backend options: fsspec-based stores (s3fs, gcsfs, adlfs), the high-performance Rust-based obstore library, and Icechunk for versioned, transactional cloud storage. You will learn authentication patterns, caching strategies, and performance tuning for remote I/O.
Zarr Documentation: https://zarr.readthedocs.io/
fsspec Documentation: https://filesystem-spec.readthedocs.io/
obstore Documentation: https://developmentseed.org/obstore/
Icechunk Documentation: https://icechunk.io/
# Using pixi (recommended for scientific projects)
pixi add zarr numpy
# fsspec-based cloud backends
pixi add s3fs gcsfs adlfs fsspec
# Using pip
pip install zarr[remote] # includes fsspec + s3fs + gcsfs + adlfs
# High-performance Rust-based backend
pip install obstore
# Versioned cloud storage
pip install icechunk
import zarr
from zarr.storage import FsspecStore
# ── S3 via fsspec ──
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={"anon": True}, # anonymous access for public data
)
root = zarr.open_group(store=store, mode="r")
# ── GCS via fsspec ──
store = FsspecStore.from_url("gs://my-bucket/data.zarr")
root = zarr.open_group(store=store, mode="r")
# ── Azure via fsspec ──
store = FsspecStore.from_url("az://my-container/data.zarr")
root = zarr.open_group(store=store, mode="r")
# ── obstore (Rust-based, high performance) ──
from obstore.store import S3Store
import zarr
obs = S3Store.from_url("s3://my-bucket/data.zarr", config={"AWS_REGION": "us-west-2"})
store = zarr.storage.ObjectStore(obs, read_only=True)
root = zarr.open_group(store=store, mode="r")
# ── Icechunk (versioned storage) ──
from icechunk import IcechunkStore, StorageConfig
storage = StorageConfig.s3_from_env(bucket="my-bucket", prefix="data.zarr")
store = IcechunkStore.open_or_create(storage=storage, mode="w")
root = zarr.open_group(store=store, mode="w")
Need to access cloud Zarr data?
├── Public dataset, read-only?
│ └── FsspecStore.from_url with anon=True
├── Authenticated access?
│ ├── Already using fsspec elsewhere?
│ │ └── FsspecStore.from_url with storage_options
│ ├── Need maximum throughput?
│ │ └── obstore (Rust-based, async I/O)
│ └── Need versioning / transactions?
│ └── Icechunk
├── Which cloud provider?
│ ├── AWS S3 → s3fs or obstore S3Store
│ ├── Google GCS → gcsfs or obstore GCSStore
│ └── Azure Blob → adlfs or obstore AzureStore
└── Need caching for repeated reads?
└── fsspec with simplecache or filecache
Use this skill when:
Zarr decouples array logic from storage through a Store interface. Any object that implements the Zarr store protocol can serve as a backend.
| Backend | Library | Async | Speed | Versioning | Best For |
|---|---|---|---|---|---|
| LocalStore | zarr (built-in) | No | Fastest | No | Local disk, NFS |
| FsspecStore | zarr + fsspec | Yes | Good | No | Broad cloud support, caching |
| ObjectStore | zarr + obstore | Yes | Very Fast | No | High-throughput cloud I/O |
| IcechunkStore | icechunk | Yes | Fast | Yes | Versioned cloud datasets |
# LocalStore (default when passing a path string)
root = zarr.open_group("local_data.zarr", mode="w")
# FsspecStore (any fsspec-supported URL)
from zarr.storage import FsspecStore
store = FsspecStore.from_url("s3://bucket/path.zarr")
# ObjectStore (obstore backend)
from obstore.store import S3Store
obs = S3Store.from_url("s3://bucket/path.zarr")
store = zarr.storage.ObjectStore(obs)
# IcechunkStore (versioned)
from icechunk import IcechunkStore, StorageConfig
storage = StorageConfig.s3_from_env(bucket="bucket", prefix="path.zarr")
store = IcechunkStore.open_or_create(storage=storage, mode="w")
Via fsspec (s3fs):
from zarr.storage import FsspecStore
# Anonymous access (public buckets)
store = FsspecStore.from_url(
"s3://noaa-goes16/ABI-L2-CMIPF/2024/001/00/",
storage_options={"anon": True},
)
# Authenticated via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
store = FsspecStore.from_url("s3://my-bucket/data.zarr")
# Explicit credentials
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={
"key": "AKIAIOSFODNN7EXAMPLE",
"secret": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"endpoint_url": "https://s3.us-west-2.amazonaws.com",
},
)
# Using a named AWS profile
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={"profile": "research-account"},
)
Via obstore (Rust-based, higher throughput):
from obstore.store import S3Store
import zarr
# Credentials from environment or instance metadata (IAM role)
obs = S3Store.from_url(
"s3://my-bucket/data.zarr",
config={"AWS_REGION": "us-west-2"},
)
store = zarr.storage.ObjectStore(obs, read_only=True)
root = zarr.open_group(store=store, mode="r")
from zarr.storage import FsspecStore
# Anonymous access
store = FsspecStore.from_url(
"gs://public-bucket/data.zarr",
storage_options={"token": "anon"},
)
# Service account JSON key
store = FsspecStore.from_url(
"gs://my-bucket/data.zarr",
storage_options={"token": "/path/to/service-account.json"},
)
# Application default credentials (gcloud auth application-default login)
store = FsspecStore.from_url(
"gs://my-bucket/data.zarr",
storage_options={"token": "google_default"},
)
from zarr.storage import FsspecStore
# Connection string
store = FsspecStore.from_url(
"az://my-container/data.zarr",
storage_options={"connection_string": "DefaultEndpointsProtocol=https;..."},
)
# Account key
store = FsspecStore.from_url(
"az://my-container/data.zarr",
storage_options={
"account_name": "mystorageaccount",
"account_key": "base64-encoded-key",
},
)
# Managed identity (Azure VM, AKS, Functions)
store = FsspecStore.from_url(
"az://my-container/data.zarr",
storage_options={"account_name": "mystorageaccount", "anon": False},
)
Icechunk adds Git-like versioning on top of any cloud store, enabling branches, tags, time-travel reads, and atomic commits.
from icechunk import IcechunkStore, StorageConfig
import zarr
# Create a versioned store on S3
storage = StorageConfig.s3_from_env(
bucket="my-bucket",
prefix="versioned-data.zarr",
region="us-west-2",
)
store = IcechunkStore.open_or_create(storage=storage, mode="w")
# Write data
root = zarr.open_group(store=store, mode="w")
arr = root.create_array("temperature", shape=(365, 180, 360), chunks=(30, 90, 180), dtype="float32")
arr[:] = 0.0
# Commit changes (like git commit)
store.commit("Initial temperature array")
# Read at a specific snapshot
snapshot_id = store.snapshot_id
store_at_snapshot = IcechunkStore.open_existing(
storage=storage, mode="r", snapshot_id=snapshot_id
)
# Branch workflow
store.new_branch("experiment-1")
store.checkout(branch="experiment-1")
fsspec provides built-in caching to avoid repeated downloads of the same chunks.
from zarr.storage import FsspecStore
# Simple cache — downloads to local temp directory
store = FsspecStore.from_url(
"simplecache::s3://my-bucket/data.zarr",
storage_options={
"s3": {"anon": True},
"simplecache": {"cache_storage": "/tmp/zarr-cache"},
},
)
# File cache — persistent, checks remote for updates
store = FsspecStore.from_url(
"filecache::s3://my-bucket/data.zarr",
storage_options={
"s3": {"anon": True},
"filecache": {
"cache_storage": "/data/zarr-cache",
"expiry_time": 3600, # seconds
},
},
)
Key factors for remote I/O performance:
| Factor | Recommendation |
|---|---|
| Chunk size | 1–10 MB per chunk for cloud (too small = too many requests) |
| Concurrent requests | Increase max_concurrency in storage_options |
| Connection pooling | obstore handles this automatically; for s3fs set config_kwargs |
| Consolidated metadata | Use zarr.consolidate_metadata() for v2 stores |
| Read pattern | Sequential access is faster than random access on cloud |
# Increase S3 concurrency with fsspec
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={
"anon": True,
"config_kwargs": {"max_pool_connections": 50},
},
)
# obstore automatically manages connection pooling and async I/O
from obstore.store import S3Store
obs = S3Store.from_url(
"s3://my-bucket/data.zarr",
client_options={"timeout": "30s", "connect_timeout": "5s"},
)