From zarr-data-format
Configures Zarr stores on S3, GCS, Azure Blob using fsspec (s3fs/gcsfs/adlfs), obstore, Icechunk. Covers authentication, caching, performance tuning for remote I/O.
npx claudepluginhub uw-ssec/rse-plugins --plugin zarr-data-formatThis skill uses the workspace's default tool permissions.
Configure Zarr to read and write arrays on cloud object storage — **Amazon S3**, **Google Cloud Storage (GCS)**, and **Azure Blob Storage**. This skill covers the full range of backend options: **fsspec**-based stores (s3fs, gcsfs, adlfs), the high-performance Rust-based **obstore** library, and **Icechunk** for versioned, transactional cloud storage. You will learn authentication patterns, cac...
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Configure Zarr to read and write arrays on cloud object storage — Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage. This skill covers the full range of backend options: fsspec-based stores (s3fs, gcsfs, adlfs), the high-performance Rust-based obstore library, and Icechunk for versioned, transactional cloud storage. You will learn authentication patterns, caching strategies, and performance tuning for remote I/O.
Zarr Documentation: https://zarr.readthedocs.io/
fsspec Documentation: https://filesystem-spec.readthedocs.io/
obstore Documentation: https://developmentseed.org/obstore/
Icechunk Documentation: https://icechunk.io/
# Using pixi (recommended for scientific projects)
pixi add zarr numpy
# fsspec-based cloud backends
pixi add s3fs gcsfs adlfs fsspec
# Using pip
pip install zarr[remote] # includes fsspec + s3fs + gcsfs + adlfs
# High-performance Rust-based backend
pip install obstore
# Versioned cloud storage
pip install icechunk
import zarr
from zarr.storage import FsspecStore
# ── S3 via fsspec ──
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={"anon": True}, # anonymous access for public data
)
root = zarr.open_group(store=store, mode="r")
# ── GCS via fsspec ──
store = FsspecStore.from_url("gs://my-bucket/data.zarr")
root = zarr.open_group(store=store, mode="r")
# ── Azure via fsspec ──
store = FsspecStore.from_url("az://my-container/data.zarr")
root = zarr.open_group(store=store, mode="r")
# ── obstore (Rust-based, high performance) ──
from obstore.store import S3Store
import zarr
obs = S3Store.from_url("s3://my-bucket/data.zarr", config={"AWS_REGION": "us-west-2"})
store = zarr.storage.ObjectStore(obs, read_only=True)
root = zarr.open_group(store=store, mode="r")
# ── Icechunk (versioned storage) ──
from icechunk import IcechunkStore, StorageConfig
storage = StorageConfig.s3_from_env(bucket="my-bucket", prefix="data.zarr")
store = IcechunkStore.open_or_create(storage=storage, mode="w")
root = zarr.open_group(store=store, mode="w")
Need to access cloud Zarr data?
├── Public dataset, read-only?
│ └── FsspecStore.from_url with anon=True
├── Authenticated access?
│ ├── Already using fsspec elsewhere?
│ │ └── FsspecStore.from_url with storage_options
│ ├── Need maximum throughput?
│ │ └── obstore (Rust-based, async I/O)
│ └── Need versioning / transactions?
│ └── Icechunk
├── Which cloud provider?
│ ├── AWS S3 → s3fs or obstore S3Store
│ ├── Google GCS → gcsfs or obstore GCSStore
│ └── Azure Blob → adlfs or obstore AzureStore
└── Need caching for repeated reads?
└── fsspec with simplecache or filecache
Use this skill when:
Zarr decouples array logic from storage through a Store interface. Any object that implements the Zarr store protocol can serve as a backend.
| Backend | Library | Async | Speed | Versioning | Best For |
|---|---|---|---|---|---|
| LocalStore | zarr (built-in) | No | Fastest | No | Local disk, NFS |
| FsspecStore | zarr + fsspec | Yes | Good | No | Broad cloud support, caching |
| ObjectStore | zarr + obstore | Yes | Very Fast | No | High-throughput cloud I/O |
| IcechunkStore | icechunk | Yes | Fast | Yes | Versioned cloud datasets |
# LocalStore (default when passing a path string)
root = zarr.open_group("local_data.zarr", mode="w")
# FsspecStore (any fsspec-supported URL)
from zarr.storage import FsspecStore
store = FsspecStore.from_url("s3://bucket/path.zarr")
# ObjectStore (obstore backend)
from obstore.store import S3Store
obs = S3Store.from_url("s3://bucket/path.zarr")
store = zarr.storage.ObjectStore(obs)
# IcechunkStore (versioned)
from icechunk import IcechunkStore, StorageConfig
storage = StorageConfig.s3_from_env(bucket="bucket", prefix="path.zarr")
store = IcechunkStore.open_or_create(storage=storage, mode="w")
Via fsspec (s3fs):
from zarr.storage import FsspecStore
# Anonymous access (public buckets)
store = FsspecStore.from_url(
"s3://noaa-goes16/ABI-L2-CMIPF/2024/001/00/",
storage_options={"anon": True},
)
# Authenticated via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
store = FsspecStore.from_url("s3://my-bucket/data.zarr")
# Explicit credentials
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={
"key": "AKIAIOSFODNN7EXAMPLE",
"secret": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"endpoint_url": "https://s3.us-west-2.amazonaws.com",
},
)
# Using a named AWS profile
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={"profile": "research-account"},
)
Via obstore (Rust-based, higher throughput):
from obstore.store import S3Store
import zarr
# Credentials from environment or instance metadata (IAM role)
obs = S3Store.from_url(
"s3://my-bucket/data.zarr",
config={"AWS_REGION": "us-west-2"},
)
store = zarr.storage.ObjectStore(obs, read_only=True)
root = zarr.open_group(store=store, mode="r")
from zarr.storage import FsspecStore
# Anonymous access
store = FsspecStore.from_url(
"gs://public-bucket/data.zarr",
storage_options={"token": "anon"},
)
# Service account JSON key
store = FsspecStore.from_url(
"gs://my-bucket/data.zarr",
storage_options={"token": "/path/to/service-account.json"},
)
# Application default credentials (gcloud auth application-default login)
store = FsspecStore.from_url(
"gs://my-bucket/data.zarr",
storage_options={"token": "google_default"},
)
from zarr.storage import FsspecStore
# Connection string
store = FsspecStore.from_url(
"az://my-container/data.zarr",
storage_options={"connection_string": "DefaultEndpointsProtocol=https;..."},
)
# Account key
store = FsspecStore.from_url(
"az://my-container/data.zarr",
storage_options={
"account_name": "mystorageaccount",
"account_key": "base64-encoded-key",
},
)
# Managed identity (Azure VM, AKS, Functions)
store = FsspecStore.from_url(
"az://my-container/data.zarr",
storage_options={"account_name": "mystorageaccount", "anon": False},
)
Icechunk adds Git-like versioning on top of any cloud store, enabling branches, tags, time-travel reads, and atomic commits.
from icechunk import IcechunkStore, StorageConfig
import zarr
# Create a versioned store on S3
storage = StorageConfig.s3_from_env(
bucket="my-bucket",
prefix="versioned-data.zarr",
region="us-west-2",
)
store = IcechunkStore.open_or_create(storage=storage, mode="w")
# Write data
root = zarr.open_group(store=store, mode="w")
arr = root.create_array("temperature", shape=(365, 180, 360), chunks=(30, 90, 180), dtype="float32")
arr[:] = 0.0
# Commit changes (like git commit)
store.commit("Initial temperature array")
# Read at a specific snapshot
snapshot_id = store.snapshot_id
store_at_snapshot = IcechunkStore.open_existing(
storage=storage, mode="r", snapshot_id=snapshot_id
)
# Branch workflow
store.new_branch("experiment-1")
store.checkout(branch="experiment-1")
fsspec provides built-in caching to avoid repeated downloads of the same chunks.
from zarr.storage import FsspecStore
# Simple cache — downloads to local temp directory
store = FsspecStore.from_url(
"simplecache::s3://my-bucket/data.zarr",
storage_options={
"s3": {"anon": True},
"simplecache": {"cache_storage": "/tmp/zarr-cache"},
},
)
# File cache — persistent, checks remote for updates
store = FsspecStore.from_url(
"filecache::s3://my-bucket/data.zarr",
storage_options={
"s3": {"anon": True},
"filecache": {
"cache_storage": "/data/zarr-cache",
"expiry_time": 3600, # seconds
},
},
)
Key factors for remote I/O performance:
| Factor | Recommendation |
|---|---|
| Chunk size | 1–10 MB per chunk for cloud (too small = too many requests) |
| Concurrent requests | Increase max_concurrency in storage_options |
| Connection pooling | obstore handles this automatically; for s3fs set config_kwargs |
| Consolidated metadata | Use zarr.consolidate_metadata() for v2 stores |
| Read pattern | Sequential access is faster than random access on cloud |
# Increase S3 concurrency with fsspec
store = FsspecStore.from_url(
"s3://my-bucket/data.zarr",
storage_options={
"anon": True,
"config_kwargs": {"max_pool_connections": 50},
},
)
# obstore automatically manages connection pooling and async I/O
from obstore.store import S3Store
obs = S3Store.from_url(
"s3://my-bucket/data.zarr",
client_options={"timeout": "30s", "connect_timeout": "5s"},
)