From zarr-data-format
Work with Zarr Python library for chunked, compressed N-dimensional arrays. Covers creation, groups, metadata/attributes, indexing, data types, sharding, v2/v3 differences.
npx claudepluginhub uw-ssec/rse-plugins --plugin zarr-data-formatThis skill uses the workspace's default tool permissions.
Master **Zarr**, the cloud-native format for chunked, compressed, N-dimensional arrays. This skill covers the full Zarr-Python 3 API for both v2 and v3 format specifications.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Master Zarr, the cloud-native format for chunked, compressed, N-dimensional arrays. This skill covers the full Zarr-Python 3 API for both v2 and v3 format specifications.
Official Documentation: https://zarr.readthedocs.io/ Zarr Specification: https://zarr-specs.readthedocs.io/ GitHub: https://github.com/zarr-developers/zarr-python
pixi add zarr numpy numcodecs # pixi (recommended)
pip install zarr[extra] # pip
pip install zarr[remote] # cloud backends
import zarr
import numpy as np
# Create array (v3 format, default)
z = zarr.create_array(store="data.zarr", shape=(10000, 10000), chunks=(1000, 1000), dtype="float32")
z[:] = np.random.randn(10000, 10000).astype("float32")
# Open existing
z = zarr.open_array("data.zarr", mode="r")
subset = z[0:100, 0:100]
# Groups
root = zarr.open_group("experiment.zarr", mode="w")
obs = root.create_group("observations")
arr = obs.create_array("temperature", shape=(365, 180, 360), chunks=(30, 90, 180), dtype="float32")
# Metadata
root.attrs["project"] = "Climate Study 2025"
arr.attrs["units"] = "Celsius"
print(root.tree())
Working with chunked N-dimensional arrays? → Zarr is ideal
Need cloud-native storage (S3/GCS/Azure)? → Zarr + cloud backend (see cloud-storage-backends skill)
Need parallel read/write? → Zarr supports concurrent access natively
Need versioning / ACID transactions? → Icechunk with Zarr
Need to reduce object count on cloud? → Zarr v3 sharding
Working with existing v2 data? → zarr-python 3 supports both v2 and v3
| Feature | v2 | v3 |
|---|---|---|
| Metadata file | .zarray, .zgroup, .zattrs | zarr.json (single) |
| Default compressor | Blosc | Zstd |
| Sharding | No | Yes (ZEP 0002) |
| Async I/O | No | Yes (native asyncio) |
| Python | 3.10+ | 3.11+ |
| Chunk key format | 0.0.0 (dot) | c/0/0/0 (path) |
| Codec system | Single compressor + filters | Composable pipeline |
Default to v3 for new projects. Use zarr_format=2 only for backward compatibility.
z = zarr.create_array(store="data.zarr", shape=(5000, 5000), chunks=(500, 500), dtype="float64",
fill_value=0.0, compressors=zarr.codecs.ZstdCodec(level=3))
# Convenience: zarr.zeros(), zarr.ones(), zarr.full(), zarr.empty()
| Mode | Description |
|---|---|
'r' | Read-only (error if not found) |
'r+' | Read/write (must exist) |
'w' | Write (overwrite if exists) |
'w-' | Create (error if exists) |
'a' | Append (create or open) |
Groups provide HDF5-like hierarchical organization. Create with root.create_group("name"), navigate with root["path/to/child"], list with root.keys(), and inspect with root.tree().
Every array and group has .attrs for key-value metadata. Use CF Conventions attributes (units, long_name, standard_name) for scientific data. Use .info for quick summaries and .info_complete() for detailed stats (slow on large arrays).
| Mode | Syntax | Returns |
|---|---|---|
| Basic slicing | z[0:100, 0:100] | Contiguous subarray |
| Coordinate | z.vindex[rows, cols] | Values at (row, col) pairs |
| Mask | z.vindex[bool_mask] | 1D array of True-masked values |
| Orthogonal | z.oindex[rows, cols] | Cartesian product (rows × cols) |
| Block | z.blocks[0, 0] | Entire chunk blocks by index |
| Field | z["field_name"] | Structured dtype field |
→ See references/patterns.md Pattern 4 for detailed indexing examples.
Zarr supports: all NumPy numeric types (int8–int64, uint8–uint64, float16–float64, complex64/128), bool, fixed-length strings (S10), datetime64, timedelta64, structured/compound dtypes, and variable-length strings (v2 only via VLenUTF8).
ThreadSynchronizer (v2) or ProcessSynchronizer (v2)blosc.use_threads = False to prevent silent corruptionzarr.config.set({"async.concurrency": 64})Groups multiple chunks into larger shard files to reduce object count on cloud stores.
z = zarr.create_array(store="sharded.zarr", shape=(10000, 10000),
chunks=(2500, 2500), shards=(500, 500), dtype="float32")
| Area | Guidance |
|---|---|
| Chunk size | 1–10 MB for cloud, 100 KB–1 MB for local |
| Fill value | Use np.nan for floats, not 0 if 0 is valid |
| Precision | Prefer float32 over float64 when possible (2× savings) |
| Metadata | Always set units, long_name; follow CF Conventions |
| Cloud | Use sharding (v3) + consolidated metadata |
| Concurrency | Write to non-overlapping chunks; set blosc.use_threads = False |
| Safety | Use mode="r" for reads, mode="w-" to prevent overwrites |