From everything-claude-trading
- Validating trading strategy parameters using robust out-of-sample testing
npx claudepluginhub brainbytes-dev/everything-claude-tradingThis skill uses the workspace's default tool permissions.
- Validating trading strategy parameters using robust out-of-sample testing
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
In-Sample (IS):
Out-of-Sample (OOS):
Common Mistake — Data Snooping:
Danger: Testing multiple strategies on the same OOS period
Each test "uses up" the OOS data. After 20 strategies tested on the same OOS:
- You are effectively optimizing on OOS data
- True OOS is only the FIRST strategy tested on that data
- Solution: reserve a final holdout sample, or use walk-forward methodology
Rule of thumb: once you look at OOS results and go back to modify
the strategy, that OOS period is no longer truly out-of-sample.
Concept: Divide data into multiple IS/OOS windows, optimize on each IS window, test on the immediately following OOS window, then advance the window and repeat.
Anchored Walk-Forward:
Window 1: IS [Jan 2015 - Dec 2019] -> OOS [Jan 2020 - Jun 2020]
Window 2: IS [Jan 2015 - Jun 2020] -> OOS [Jul 2020 - Dec 2020]
Window 3: IS [Jan 2015 - Dec 2020] -> OOS [Jan 2021 - Jun 2021]
...
Properties:
- IS window grows over time (always starts from beginning)
- More data for optimization in later windows
- Incorporates all historical data
- Best when: strategy parameters are expected to be stable over long periods
- Drawback: old data may be irrelevant if market structure has changed
Rolling Walk-Forward:
Window 1: IS [Jan 2015 - Dec 2019] -> OOS [Jan 2020 - Jun 2020]
Window 2: IS [Jul 2015 - Jun 2020] -> OOS [Jul 2020 - Dec 2020]
Window 3: IS [Jan 2016 - Dec 2020] -> OOS [Jan 2021 - Jun 2021]
...
Properties:
- IS window is fixed length, slides forward
- Adapts to changing market conditions (drops oldest data)
- Best when: market microstructure or regime changes make old data less relevant
- Drawback: fixed IS window may be too short for stable estimation
Expanding Walk-Forward with Decay:
Hybrid approach: anchored start but apply exponential decay weights to older data
Recent data gets higher weight in optimization
Captures both long history and regime adaptation
More complex to implement but often superior in practice
Definition:
WFE = OOS performance / IS performance
Interpretation:
WFE > 0.80: Excellent — strategy is robust, minimal overfitting
WFE 0.50-0.80: Acceptable — some degradation but strategy has edge
WFE 0.20-0.50: Concerning — significant overfitting, strategy may not survive live trading
WFE < 0.20: Poor — strategy is likely overfit; IS performance is an illusion
Measured using Sharpe ratio, profit factor, or total return
Calculate WFE for each walk-forward window and assess distribution
WFE Across Windows:
Window 1 WFE: 0.72
Window 2 WFE: 0.65
Window 3 WFE: 0.88
Window 4 WFE: 0.41
Window 5 WFE: 0.75
Window 6 WFE: 0.58
Mean WFE: 0.67 (acceptable)
Min WFE: 0.41 (Window 4 is concerning — what was different about that period?)
Std WFE: 0.16 (moderate variability)
Investigation: Window 4 covers 2022 H1 — regime change (rate hikes, inflation)
Strategy may need regime-specific adaptation
Concept: Parameters that change wildly between IS windows are likely overfit. Robust parameters are stable.
Analysis Method:
For each walk-forward window, record optimal parameters:
Window 1: fast_MA=10, slow_MA=50, stop=2.0%
Window 2: fast_MA=12, slow_MA=48, stop=2.2%
Window 3: fast_MA=11, slow_MA=52, stop=1.8%
Window 4: fast_MA=8, slow_MA=55, stop=2.5%
Window 5: fast_MA=10, slow_MA=50, stop=2.0%
fast_MA: mean=10.2, std=1.5, CV=0.15 -> Stable (good)
slow_MA: mean=51.0, std=2.6, CV=0.05 -> Very stable (good)
stop: mean=2.1%, std=0.26, CV=0.12 -> Stable (good)
Compare with unstable example:
Window 1: lookback=5, threshold=0.8
Window 2: lookback=20, threshold=1.5
Window 3: lookback=8, threshold=0.3
...
lookback: CV=0.65 -> Unstable (bad, likely overfit)
Sensitivity Surface Analysis:
For each parameter:
1. Fix all other parameters at optimal values
2. Vary the target parameter across its range
3. Plot performance vs parameter value
4. Robust parameters have flat, broad performance plateaus
5. Overfit parameters have sharp spikes (narrow optimal point)
If performance degrades >30% with ±20% parameter change:
the strategy is fragile and likely overfit to that parameter value
Regime-Based Splitting:
Instead of chronological splits only, validate across market regimes:
- Bull market periods (rising trend, low vol)
- Bear market periods (falling trend, high vol)
- Sideways/choppy periods (no trend, variable vol)
- Crisis periods (high vol, correlations spike)
- Low volatility periods (calm, grinding)
Strategy should show positive (or at least non-negative) performance
in each regime. Strong IS Sharpe driven by one regime only = fragile.
Cross-Asset Validation:
If strategy concept is general (e.g., momentum, mean reversion):
1. Optimize on Asset A
2. Test on Assets B, C, D with same or similar parameters
3. Cross-asset performance validates the underlying alpha source
4. If it only works on one asset: likely overfit to that asset's idiosyncrasies
Rule of thumb:
- IS window: should contain at least 100 trades (for statistical significance)
- OOS window: should contain at least 30 trades
- IS/OOS ratio: 3:1 to 5:1 is common
Too short IS: insufficient data for optimization, noisy parameter estimates
Too long IS: includes stale data, may not reflect current conditions
Too short OOS: insufficient data to evaluate, high variance in WFE
Too long OOS: fewer walk-forward windows, less validation breadth
For daily strategies:
- IS: 3-5 years, OOS: 6-12 months, step: 6 months
For intraday strategies:
- IS: 6-12 months, OOS: 1-3 months, step: 1 month
1. Minimize parameter count: each parameter adds a degree of freedom for overfitting
Rule: max (data_points / 10) parameters, ideally fewer
2. Use robust objectives: Sharpe > total return (Sharpe penalizes risk)
3. Constrain parameter ranges: use economically meaningful bounds
4. Prefer parameter plateaus over peaks: a parameter value on a plateau
is more likely to work out-of-sample
5. Average nearby solutions: instead of picking the single best parameter set,
average the top 10% of parameter combinations (ensemble approach)
6. Validate parameter interactions: check for spurious interactions between parameters
Strategy: Buy when fast MA crosses above slow MA, sell on reverse
Parameters to optimize: fast_period (5-30), slow_period (30-100)
Asset: S&P 500 futures, daily data 2010-2024
Walk-forward design:
- IS window: 4 years (rolling)
- OOS window: 6 months
- Step: 6 months
- Total windows: 20
Results summary:
- IS average Sharpe: 1.35
- OOS average Sharpe: 0.72
- WFE: 0.53 (acceptable but not great)
- OOS Sharpe std: 0.45 (high variability)
- Optimal fast_period: ranges from 8 to 22 (moderate stability)
- Optimal slow_period: ranges from 40 to 65 (moderate stability)
Assessment: Strategy has some edge but degrades significantly OOS.
Parameter instability suggests adapting to noise rather than signal.
Consider simpler parameterization or regime filters.
Strategy: Buy when z-score < -2, sell when z-score > +2
Parameters: lookback period, z-score threshold
Asset: Pairs trade (Coca-Cola vs PepsiCo)
Walk-forward design:
- IS window: 3 years (rolling)
- OOS window: 6 months
- 16 windows total
Results:
- IS average Sharpe: 0.95
- OOS average Sharpe: 0.78
- WFE: 0.82 (excellent)
- Optimal lookback: 55-65 days across all windows (very stable, CV=0.08)
- Optimal z-score: 1.8-2.2 across all windows (stable, CV=0.10)
Assessment: High WFE and stable parameters suggest a robust strategy.
Parameter plateau is broad (performance within 10% for lookback 40-80).
Strategy is likely capturing genuine mean reversion in the pair relationship.
Deploy with current optimal parameters, monitor for structural breaks.
Strategy: Momentum on commodity futures basket
Problem: Standard WFA shows WFE of 0.45 — concerning
Investigation: Plot WFE by OOS window
- Bull regime windows: WFE = 0.75 (good)
- Bear regime windows: WFE = 0.55 (acceptable)
- Sideways regime windows: WFE = 0.15 (terrible)
Diagnosis: Strategy loses money in sideways markets (momentum fails in chop)
Solution: Add regime filter
- VIX < 15 and trend strength (ADX) < 20: reduce position size by 50%
- VIX > 30 and ADX > 30: full position size
Revised WFA with regime filter:
- IS average Sharpe: 1.05 (lower than unfiltered, expected)
- OOS average Sharpe: 0.82 (higher than unfiltered)
- WFE: 0.78 (significant improvement)
- Filter reduces drawdowns in sideways periods by 60%
Note: regime filter adds complexity — verify it is not itself overfit
by testing filter parameters for stability across windows
Before accepting walk-forward results, verify: