From zarr-data-format
Configure and optimize numcodecs compression for Zarr arrays: Blosc, Zstd, LZ4, Gzip, LZMA; pre-filters (Delta, Quantize); pipelines, Blosc thread safety, speed/ratio trade-offs.
npx claudepluginhub uw-ssec/rse-plugins --plugin zarr-data-formatThis skill uses the workspace's default tool permissions.
Configure, select, and optimize compression codecs for Zarr arrays using **numcodecs**. This skill covers every compressor and filter in the Zarr ecosystem, thread safety for multi-process workloads, codec pipelines in Zarr v3, and performance trade-offs.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Configure, select, and optimize compression codecs for Zarr arrays using numcodecs. This skill covers every compressor and filter in the Zarr ecosystem, thread safety for multi-process workloads, codec pipelines in Zarr v3, and performance trade-offs.
Zarr Performance Guide: https://zarr.readthedocs.io/en/latest/user-guide/performance/ numcodecs Reference: https://numcodecs.readthedocs.io/ GitHub: https://github.com/zarr-developers/numcodecs
| Codec | Speed (compress) | Speed (decompress) | Ratio | Best For |
|---|---|---|---|---|
| Blosc+LZ4 | Very Fast | Very Fast | Low-Med | Real-time analysis, frequent reads |
| Blosc+Zstd | Medium | Fast | High | General purpose (v2 default) |
| Zstd standalone | Medium | Fast | High | Zarr v3 default |
| Blosc+LZ4HC | Slow | Very Fast | Medium | Write-once, read-many |
| Gzip | Slow | Medium | Med-High | Interop with non-Python tools |
| LZ4 standalone | Very Fast | Very Fast | Low | Maximum throughput |
| LZMA | Very Slow | Very Slow | Very High | Archival only |
Blosc wraps internal algorithms and adds byte-shuffling — the single most impactful setting for numerical data compression. Shuffle rearranges bytes to expose patterns, yielding 10–40× better ratios.
| Parameter | Options | Default |
|---|---|---|
cname | blosclz, lz4, lz4hc, snappy, zlib, zstd | blosclz |
clevel | 0 (none) – 9 (max) | 5 |
shuffle | NOSHUFFLE (0), SHUFFLE (1), BITSHUFFLE (2) | SHUFFLE |
from numcodecs import Blosc
Blosc(cname='zstd', clevel=5, shuffle=Blosc.SHUFFLE) # balanced
Blosc(cname='lz4', clevel=1, shuffle=Blosc.SHUFFLE) # max speed
Blosc(cname='zstd', clevel=9, shuffle=Blosc.BITSHUFFLE) # max ratio
Blosc's internal threading is not fork-safe. Multi-process use (Dask workers, multiprocessing, joblib) can cause silent data corruption.
from numcodecs import blosc
blosc.use_threads = False # ALWAYS set this in multi-process environments
# For Dask distributed:
client.run(lambda: setattr(__import__('numcodecs').blosc, 'use_threads', False))
| Codec | Import | Key Config |
|---|---|---|
| Zstd (v3 default) | from numcodecs import Zstd | Zstd(level=3) — levels 1–22 |
| LZ4 | from numcodecs import LZ4 | LZ4(acceleration=1) |
| Gzip | from numcodecs import GZip | GZip(level=5) — levels 1–9 |
| Zlib | from numcodecs import Zlib | Zlib(level=4) — levels 1–9 |
| BZ2 | from numcodecs import BZ2 | BZ2(level=5) — levels 1–9 |
| LZMA | from numcodecs import LZMA | LZMA(preset=6) — presets 0–9 |
Filters transform data before compression to improve ratios. Applied in order.
| Filter | Use Case | Example |
|---|---|---|
| Delta | Monotonic data (timestamps, indices) | Delta(dtype='int64') |
| Quantize | Reduce float precision | Quantize(digits=3, dtype='float64') |
| FixedScaleOffset | Convert floats to ints | FixedScaleOffset(offset=273.15, scale=100, dtype='float64', astype='int32') |
| PackBits | Boolean arrays (8× reduction) | PackBits() |
| Categorize | String→integer encoding | Categorize(labels=['a','b','c'], dtype='U10', astype='u1') |
# v2: filters + compressor
z = zarr.open_array('data.zarr', mode='w', shape=(10000,), dtype='int64', chunks=(1000,),
filters=[Delta(dtype='int64')], compressor=Blosc(cname='zstd', clevel=5))
# Chain: Delta → Quantize → compressor
filters=[Delta(dtype='float64'), Quantize(digits=3, dtype='float64')]
v3 replaces compressor + filters with a unified pipeline: array→array → array→bytes → bytes→bytes.
import zarr
# v3 with default Zstd
z = zarr.create_array(store='data.zarr', shape=(1000, 1000), chunks=(100, 100),
dtype='float64', zarr_format=3)
# v3 with explicit compressor
z = zarr.create_array(store='data.zarr', shape=(1000, 1000), chunks=(100, 100),
dtype='float64', compressors=zarr.codecs.ZstdCodec(level=5))
# No compression
z = zarr.create_array(store='data.zarr', shape=(1000, 1000), chunks=(100, 100),
dtype='float64', compressors=None)
Primary constraint?
├── STORAGE SIZE → Zstd level 9 or LZMA (archival only)
├── READ SPEED → Blosc+LZ4 with SHUFFLE (numerical) or LZ4 standalone
├── WRITE SPEED → LZ4(acceleration=10) or Blosc+LZ4 clevel=1
├── BALANCED → Blosc+Zstd clevel=3 (v2) or Zstd level=3 (v3)
├── INTEROP → Gzip (universal) or Zlib (NetCDF compat)
└── DATA TYPE
├── Monotonic → Delta filter + any compressor
├── Boolean → PackBits + LZ4
├── Integer → Blosc BITSHUFFLE
└── Limited precision float → Quantize filter + Zstd
blosc.use_threads = False in any multi-process environment