Investigate why ODB bars are oversized, have zero duration, or show anomalous price ranges by forensically analyzing Parquet trade data and ClickHouse cache. Use this skill whenever the user asks about ODB bar anomalies, threshold overshoot, liquidation cascades, flash crashes in range bars, same-timestamp trade bursts, order book sweeps, matching engine batches, or why bars are bigger than expected. Also use when diagnosing data quality issues in the opendeviationbar_cache — the methodology distinguishes algorithm bugs from market microstructure phenomena. TRIGGERS - oversized bars, bar overshoot, zero duration bar, liquidation cascade, flash crash ODB, trade burst, same timestamp trades, matching engine batch, order book sweep, bar too large, threshold violation, microstructure anomaly, bar forensics, ODB data quality.
From quant-researchnpx claudepluginhub terrylica/cc-skills --plugin quant-researchThis skill is limited to using the following tools:
references/clickhouse-schema.mdreferences/evolution-log.mdProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Implements distributed tracing with Jaeger/Tempo for microservices, including Kubernetes/Docker setup and OpenTelemetry instrumentation (Python/Flask). Use for debugging latency, dependencies, and request flows.
Systematic methodology for investigating Open Deviation Bar anomalies by tracing from ClickHouse cache back to raw Parquet trade data. Distinguishes algorithm correctness issues from market microstructure phenomena (liquidation cascades, order book sweeps, matching engine batch effects).
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
abs_dev_dbps exceeds the threshold (e.g., 22 dbps on a 100 dbps bar)| Source | Location | Schema | Access |
|---|---|---|---|
| ClickHouse cache | opendeviationbar_cache.open_deviation_bars on bigblack | 76 columns, see schema reference | ssh bigblack 'curl -s http://localhost:8123/ -d "..."' |
| Parquet tick cache | /home/tca/.cache/opendeviationbar/ticks/{SYMBOL}/{YYYY-MM-DD}.parquet on bigblack | agg_trade_id, price, quantity, first_trade_id, last_trade_id, timestamp, is_buyer_maker | ssh bigblack 'cd /home/tca && uv run --python 3.13 python3 -c "..."' with Polars |
Access pattern: Always query bigblack directly via SSH. The SSH tunnel (localhost:18123) is for the Flowsurface app runtime only — forensic queries go direct.
The investigation follows a 3-layer drill-down: ClickHouse overview → per-bar anomaly detection → Parquet trade-level root cause.
Query bars in the suspect time window. Look for anomaly signals in the result set.
-- Adjust symbol, threshold, and time window to match the investigation
SELECT
toDateTime64(close_time_us / 1000000, 3) AS close_ts,
toDateTime64(open_time_us / 1000000, 3) AS open_ts,
open, high, low, close,
round((high - low) / open * 10000, 1) AS range_dbps,
round(abs(close - open) / open * 10000, 1) AS abs_dev_dbps,
agg_record_count AS n_agg,
individual_trade_count AS n_trades,
round(duration_us / 1e6, 1) AS dur_s,
first_agg_trade_id AS first_id,
last_agg_trade_id AS last_id,
is_orphan, is_liquidation_cascade AS is_liq
FROM opendeviationbar_cache.open_deviation_bars
WHERE symbol = '{SYMBOL}'
AND threshold_decimal_bps = {THRESHOLD}
AND close_time_us >= toUnixTimestamp('{START_UTC}', 'UTC') * 1000000
AND close_time_us <= toUnixTimestamp('{END_UTC}', 'UTC') * 1000000
ORDER BY close_time_us
FORMAT PrettyCompact
Anomaly signals to flag:
| Signal | Column Pattern | Meaning |
|---|---|---|
| Threshold overshoot | abs_dev_dbps >> threshold / 10 | Single trade crossed beyond threshold |
| Zero duration | dur_s = 0 | Entire bar formed within one matching engine cycle |
| Extreme range | range_dbps > 2 * threshold / 10 | Price swept far beyond threshold |
| Micro trade count | n_agg < 10 with high range | Giant individual fills eating book |
| Burst clustering | Multiple bars at same second | Liquidity sweep fragmented across bars |
Filter to just the anomalous bars to get agg_trade_id ranges for Parquet drill-down:
-- Bars with threshold overshoot or zero duration
SELECT close_ts, dur_s, open, high, low, close,
range_dbps, abs_dev_dbps, n_agg,
first_agg_trade_id, last_agg_trade_id
FROM (... Layer 1 query ...)
WHERE dur_s < 1.0 OR abs_dev_dbps > {THRESHOLD / 10 * 1.5}
Record the first_agg_trade_id and last_agg_trade_id ranges — these are the Parquet lookup keys.
Use Polars on bigblack to analyze raw trades. Three analyses in sequence:
Group trades by timestamp to find matching engine batches (hundreds of trades sharing exact microsecond):
import polars as pl
df = pl.read_parquet("/home/tca/.cache/opendeviationbar/ticks/{SYMBOL}/{DATE}.parquet")
burst = df.filter(
(pl.col("agg_trade_id") >= {FIRST_ID}) &
(pl.col("agg_trade_id") <= {LAST_ID})
).sort("agg_trade_id")
# Group by timestamp to find single-cycle batches
ts_groups = burst.group_by("timestamp").agg([
pl.col("price").min().alias("min_price"),
pl.col("price").max().alias("max_price"),
pl.col("quantity").sum().alias("total_qty"),
pl.len().alias("count"),
]).sort("timestamp")
Key diagnostic: If a single timestamp has hundreds of trades spanning the full price range, it is a matching engine batch (single large order sweeping the book).
Determine whether the sweep is buy or sell dominated:
buys = burst.filter(~pl.col("is_buyer_maker")) # taker buy
sells = burst.filter(pl.col("is_buyer_maker")) # taker sell
print(f"Taker buys: {len(buys)} trades, {buys['quantity'].sum():.4f} BTC")
print(f"Taker sells: {len(sells)} trades, {sells['quantity'].sum():.4f} BTC")
Liquidation cascades are typically 95%+ one-sided (all taker sells or all taker buys).
Find individual trades with large price jumps — these are the direct cause of threshold overshoot:
with_gap = burst.sort("agg_trade_id").with_columns([
(pl.col("price") - pl.col("price").shift(1)).alias("price_diff"),
(pl.col("timestamp") - pl.col("timestamp").shift(1)).alias("ts_diff_us"),
])
# Trades with gaps exceeding threshold dollar equivalent
threshold_dollars = open_price * threshold_dbps / 10000
big_jumps = with_gap.filter(pl.col("price_diff").abs() > threshold_dollars)
If individual trade-to-trade price gaps exceed the threshold, the ODB algorithm cannot split within a single agg_trade — overshoot is inherent and correct.
After completing the 3-layer analysis, classify the finding:
| Classification | Evidence Pattern | Action |
|---|---|---|
| Liquidation cascade | 95%+ one-sided, 50-100+ BTC, same-µs timestamp, sweeps $200+ | Oracle bit-exact — no fix needed. Document the event. |
| Thin book sweep | Fewer trades but large price gaps between levels | Oracle bit-exact — book was thin at that moment. |
| Orphan bar | is_orphan = 1 in ClickHouse | Known phenomenon — writer-boundary artifact. Skip in analysis. |
| Algorithm bug | Trades are normally distributed, no burst, but bar still overshoots | File upstream issue on opendeviationbar-py. |
| Data gap | agg_trade_id discontinuity between adjacent bars | Missing Parquet data. Check collection pipeline. |
The ODB algorithm processes agg_trades sequentially. A bar closes when the first trade deviates beyond the threshold from the bar's open. The overshoot mechanism:
|Pₖ - P₀| / P₀ ≥ T — bar closes at Pₖ|Pₖ - P₀| / P₀ - TOvershoot is larger at lower thresholds because:
This is inherent to discrete trade data — not a bug.
| Skill | Relationship |
|---|---|
| opendeviation-eval-metrics | Evaluates ODB signal quality (output metrics). This skill investigates input data quality. |
| exchange-session-detector | Session flags in ClickHouse. Cascades often cluster at session boundaries (NY open/close). |
After this skill completes, check before closing:
Only update if the issue is real and reproducible — not speculative.