Skill

HarnessML: Domain Research

Use when generating feature hypotheses from domain knowledge. This is not a one-time pre-work step — return here whenever results surprise you, progress stalls, or a new data source becomes available.

npx claudepluginhub msilverblatt/harness-ml --plugin harnessml

Tool Access

This skill uses the workspace's default tool permissions.

Preview

SKILL.md

Similar Skills

github-deep-research

63.9k

Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.

2 files

bytedance-deer-flow-1

surprise-me

63.9k

Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.

bytedance-deer-flow-1

image-generation

63.9k

Generates images from structured JSON prompts via Python script execution. Supports reference images and aspect ratios for characters, scenes, products, visuals.

2 files

bytedance-deer-flow-1

Stats

Stars2

Forks0

Last CommitMar 9, 2026

Actions

View Source View Plugin View on GitHub View README

HarnessML: Domain Research

The Principle

The biggest ML gains come from features that capture real phenomena, not from hyperparameter tuning or model architecture changes. A feature that encodes domain knowledge — even imperfectly — gives the model information it cannot learn from the raw data alone.

Domain research is how you generate those features.

Generating Hypotheses

Start with the Domain, Not the Data

Before looking at correlations, ask: what does domain expertise say should matter?

Use web search, Wikipedia, academic literature, practitioner knowledge
Identify known predictive factors and the mechanisms behind them
Look for phenomena that are well-understood in the domain but not yet captured in your features

For Each Hypothesis, Document

What it is: The phenomenon in plain language
Why it should be predictive: The causal or correlational mechanism — not "it might help" but "it should help because..."
What data would capture it: Raw columns, derived signals, or external sources
Expected signal strength: Strong / Medium / Weak, with reasoning

Types of Domain Signals

Direct predictors — Features that directly measure the outcome driver.

Hospital readmission: count_of_comorbidities
House price: square_footage

Proxy signals — Indirect indicators when direct measurement is unavailable.

Financial distress: days_payable_outstanding (when cash flow data is missing)
Health risk: pharmacy_visit_frequency (when medical records are incomplete)

Interaction effects — Two features weak alone, strong together.

high_leverage * rising_rates — leverage is fine until rates move
is_diabetic * high_bmi — captures a specific high-risk population

Conditional effects — A feature that only matters in certain contexts.

marketing_spend only predicts sales for products with existing brand awareness
rainfall only affects crop yield during the growing season

Regime indicators — Signals that relationships change under different conditions.

vix_above_30 — volatility regime where correlations break down
product_lifecycle_stage — growth vs maturity dynamics differ

Contrarian signals — Counter-intuitive predictive direction. Often the most valuable.

Higher satisfaction surveys sometimes predict turnover (dissatisfied employees don't respond)
More safety incidents can predict fewer fatalities (reporting culture catches problems early)

Mapping Hypotheses to Features

For each hypothesis:

Raw columns needed — what must exist or be sourced
Transformation — the formula or logic
Feature type: instance, grouped, interaction, ratio, indicator, regime
Novelty check — does an existing feature already capture this signal?

features(action="discover")

If an existing feature correlates >0.8 with your proposed feature, yours is likely redundant. Either skip it or refine the hypothesis to capture what the existing feature misses.

The Research Log

Maintain a running log. This is the connective tissue between domain knowledge and experiment results.

### Hypothesis: [Name]
- **Domain reasoning**: Why this should be predictive (the mechanism)
- **Source**: Where you found evidence
- **Feature(s)**: Name and formula
- **Expected signal**: Strong / Medium / Weak
- **Result**: What happened when tested
- **Learning**: What this tells us about the domain
- **Follow-up**: Next hypothesis generated by this result

The Follow-up field is the most important. Every tested hypothesis should generate at least one new question.

When to Return Here

Project start: Generate initial hypotheses before modeling
After surprising results: A model that fails where you expected it to succeed means your domain understanding is incomplete
When progress stalls: If experiments aren't teaching you anything new, you need new hypotheses from a different angle
New data source: Map new data through domain reasoning before feature engineering
After diagnosis reveals error patterns: If the model consistently fails on a subgroup, ask what domain knowledge explains that subgroup's behavior

Anti-Patterns

Blind feature generation: Adding features without a hypothesis for why they should work
Over-relying on auto-search: features(action="auto_search") finds statistical artifacts; domain reasoning finds real signals. Use auto-search as a supplement, not a replacement.
Stopping after initial research: Domain research is continuous. Your best hypotheses often come after seeing what the model gets wrong.
Ignoring negative results: A feature that doesn't work is information. Why didn't it work? Was the hypothesis wrong, or is the signal already captured?