npx claudepluginhub brainbytes-dev/everything-claude-tradingThis skill uses the workspace's default tool permissions.
name: ml-for-finance
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
name: ml-for-finance description: Machine learning for trading — supervised models, feature importance, cross-validation for time series. Use when applying ML to trading problems.
Finance presents unique challenges that make naive ML application dangerous:
From most to least dangerous:
Tree-based models are the workhorse of ML in finance due to their ability to capture non-linear interactions without explicit specification.
Random Forest:
XGBoost / LightGBM:
Practical guidance:
Understanding which features drive predictions is critical for trust and debugging.
Permutation importance:
SHAP (SHapley Additive exPlanations):
MDI (Mean Decrease in Impurity):
Standard k-fold CV is invalid for financial time series because:
Purged k-fold CV (Lopez de Prado):
1. Split data into k time-ordered folds
2. For each fold used as test:
- Remove (purge) training samples whose labels overlap with test samples
- Add embargo period after each test fold before allowing training data
- Embargo length >= label horizon (e.g., if predicting 5-day return, embargo 5+ days)
3. Train on purged training set, evaluate on test fold
4. Average performance across folds
Combinatorial Purged CV (CPCV):
Walk-forward validation (expanding or rolling window):
For t = T_start to T_end:
Train on data [0, t-embargo]
Predict on data [t, t+step]
Advance t by step
Pros: Most realistic simulation of live trading
Cons: Early predictions use less training data
Preferred for final production evaluation
Checklist of common sources:
[ ] Features computed using only past data (no future prices, volumes, fundamentals)
[ ] Labels do not overlap between train and test (purged CV)
[ ] Data preprocessing (scaling, imputation) fit ONLY on training data
[ ] Point-in-time data used for fundamentals (not restated data)
[ ] Universe selection does not use future information (no survivorship bias)
[ ] Feature engineering code verified: no accidental .shift(-1) errors
[ ] Target variable properly lagged (predict NEXT period return, not current)
Combining multiple models reduces variance and improves robustness:
ML models in production decay as market conditions change:
Monitor weekly/monthly:
- Rolling IC of predictions vs realized returns
- Rolling hit rate (directional accuracy)
- Feature importance stability (top features changing = drift)
- Prediction distribution shift (mean, std of predictions over time)
- Strategy Sharpe on rolling 6-month window
Alert thresholds:
- IC drops below 50% of training IC for 2+ months
- Hit rate drops below 51% for 3+ months
- Feature importance ranking changes by >3 positions for top features
Response:
- Retrain on recent data (expanding or rolling window)
- Re-evaluate feature set (some features may have lost predictive power)
- Check for regime change (model may need regime conditioning)
Model: LightGBM Classifier (return sign prediction)
Universe: S&P 500 | Frequency: Daily | Features: 45
Training: 2010-01-01 to 2019-12-31
Validation: 2020-01-01 to 2021-12-31
Test: 2022-01-01 to 2024-12-31
--- Purged CV Results (5-fold, 5-day embargo) ---
AUC: 0.528 +/- 0.008
Accuracy: 52.1% +/- 0.5%
IC (rank): 0.024 +/- 0.012
--- Walk-Forward Test (2022-2024) ---
AUC: 0.521
Accuracy: 51.8%
IC (rank): 0.019
Long-short Sharpe: 0.65 (gross), 0.32 (net of costs)
Turnover: 85% monthly
--- Top Features (SHAP) ---
1. 20-day momentum residual (importance: 0.15)
2. Earnings revision breadth (importance: 0.12)
3. 5-day realized vol ratio (importance: 0.09)
4. Sector-relative RSI (importance: 0.08)
5. Short interest change (importance: 0.07)
Verdict: Marginal signal. Net Sharpe < 0.5. Explore feature engineering
or combination with existing alpha before production deployment.
[ ] Target variable defined with proper lag (no look-ahead)
[ ] Features use point-in-time data only
[ ] Universe is survivorship-bias-free
[ ] Train/validation/test split respects time ordering
[ ] Purged CV with embargo applied (embargo >= label horizon)
[ ] Multiple model types compared (RF, XGBoost, LightGBM, linear)
[ ] Hyperparameters tuned on validation set only (never on test)
[ ] Feature importance computed on OOS data (SHAP or permutation)
[ ] Transaction costs modeled (turnover * cost deducted from returns)
[ ] Comparison vs simple baselines (linear model, equal-weight signals)
[ ] Model decay monitoring plan established
[ ] Retraining schedule defined
Before deploying an ML model for live trading: