Guided architecture selection based on data modality, task type, and constraints
Guides ML architecture selection by analyzing your data modality, task type, and constraints like dataset size and deployment requirements. Provides specific recommendations with alternatives and capacity matching rules.
/plugin marketplace add tachyon-beep/skillpacks/plugin install yzmir-neural-architectures@foundryside-marketplace[modality] [task] [constraints]You are guiding architecture selection for a machine learning task. Follow the systematic selection framework.
Architecture comes BEFORE training optimization. Wrong architecture = no amount of training will fix it.
Match architecture's inductive biases to the problem's structure.
Ask the user if not clear:
MUST ask these before recommending:
| Constraint | Question | Impact |
|---|---|---|
| Dataset size | How many training samples? | Small (<10k) → Simple models, Large (>100k) → Complex OK |
| Deployment | Where will it run? | Cloud → Any, Edge → Efficient, Mobile → MobileNet |
| Latency | Speed requirement? | Real-time (<10ms) → MobileNet, Batch → Any |
| Compute | GPU available? VRAM? | Limited → Smaller models, Unlimited → Any |
| Accuracy | How critical? | Maximum → Larger models, Production → Balanced |
Data Modality?
│
├─ IMAGES
│ ├─ Dataset size?
│ │ ├─ Small (<10k) → ResNet-18 or EfficientNet-B0
│ │ ├─ Medium (10k-100k) → ResNet-50 or EfficientNet-B2
│ │ └─ Large (>100k) → EfficientNet-B4 or ViT
│ └─ Deployment?
│ ├─ Cloud → Any above
│ ├─ Edge → EfficientNet-Lite or MobileNetV3-Large
│ └─ Mobile → MobileNetV3-Small + INT8 quantization
│
├─ SEQUENCES
│ ├─ Sequence length?
│ │ ├─ Short (<100) → LSTM/GRU
│ │ ├─ Medium (100-1000) → Transformer
│ │ └─ Long (>1000) → Sparse Transformer (Longformer)
│ └─ Latency?
│ ├─ Real-time → LSTM or TCN
│ └─ Batch → Transformer
│
├─ GRAPHS
│ └─ Graph size?
│ ├─ Small (<1000 nodes) → GCN or GAT
│ └─ Large → GraphSAGE (sampling)
│
├─ GENERATION
│ └─ Priority?
│ ├─ Quality → Diffusion
│ ├─ Speed → GAN
│ └─ Latent space → VAE
│
└─ TABULAR
└─ Dataset size?
├─ Tiny (<1000) → Linear/Ridge
├─ Small (1k-100k) → 2-3 layer MLP or XGBoost
└─ Large (>100k) → Deeper MLP or gradient boosting
Resist recommending "trendy" architectures:
| Trendy Choice | When NOT to Use | Better Alternative |
|---|---|---|
| Vision Transformer (ViT) | Small dataset (<10k) | CNN (ResNet, EfficientNet) |
| Vision Transformer (ViT) | Edge/mobile deployment | MobileNet, EfficientNet-Lite |
| Transformers (general) | Very small datasets | LSTM, CNN (less capacity) |
| Diffusion Models | Real-time generation | GAN (1 forward pass) |
| Diffusion Models | Limited training compute | VAE (faster training) |
| Graph Transformers | Small graphs (<100 nodes) | Standard GNN (simpler) |
Counter-narrative: "New ≠ better for your use case. Match architecture to constraints."
Rule of thumb: Parameters ≈ 0.01-0.1× dataset size
| Dataset Size | Max Model Params | Example |
|---|---|---|
| 1,000 | 10-100 params | Linear regression |
| 10,000 | 100K-1M params | Small MLP or ResNet-18 |
| 100,000 | 1M-10M params | ResNet-50, EfficientNet-B2 |
| 1,000,000 | 10M-100M params | ResNet-101, EfficientNet-B4 |
| 10,000,000+ | 100M+ params | Large Transformers, ViT |
After gathering requirements, provide:
## Architecture Recommendation
**Selected Architecture**: [Name]
**Why**: [Justification based on constraints]
### Key Specs
- Parameters: [count]
- Expected latency: [ms] on [device]
- Dataset requirement: [minimum samples]
### Alternatives Considered
1. [Alternative 1]: Not selected because [reason]
2. [Alternative 2]: Not selected because [reason]
### Next Steps
1. Verify memory budget: [calculation]
2. Start with pretrained weights if available
3. For training optimization → yzmir-training-optimization
4. For PyTorch implementation → yzmir-pytorch-engineering
### Red Flags to Watch
- [Potential issue based on constraints]
After architecture selection:
import glob
# For training the architecture
training_pack = glob.glob("plugins/yzmir-training-optimization/plugin.json")
if not training_pack:
print("Recommend: yzmir-training-optimization for optimizer/LR selection")
# For PyTorch implementation
pytorch_pack = glob.glob("plugins/yzmir-pytorch-engineering/plugin.json")
if not pytorch_pack:
print("Recommend: yzmir-pytorch-engineering for implementation")
# For deployment
ml_prod = glob.glob("plugins/yzmir-ml-production/plugin.json")
if not ml_prod:
print("Recommend: yzmir-ml-production for quantization/serving")