The architecture selection router for CNNs, Transformers, RNNs, GANs, GNNs by data modality and constraints
Routes architecture selection by data modality (images, sequences, graphs, generation) and constraints (dataset size, compute, latency). Use when choosing WHAT architecture for a new problem or comparing families like CNN vs Transformer.
/plugin marketplace add tachyon-beep/skillpacks/plugin install yzmir-neural-architectures@foundryside-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
architecture-design-principles.mdattention-mechanisms-catalog.mdcnn-families-and-selection.mdgenerative-model-families.mdgraph-neural-networks-basics.mdnormalization-techniques.mdsequence-models-comparison.mdtransformer-architecture-deepdive.md<CRITICAL_CONTEXT> Architecture selection comes BEFORE training optimization. Wrong architecture = no amount of training will fix it.
This meta-skill routes you to the right architecture guidance based on:
Load this skill when architecture decisions are needed. </CRITICAL_CONTEXT>
Use this skill when:
DO NOT use for:
When in doubt: If choosing WHAT architecture → this skill. If training/deploying architecture → different pack.
IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.
When this skill is loaded from:
skills/using-neural-architectures/SKILL.md
Reference sheets like cnn-families-and-selection.md are at:
skills/using-neural-architectures/cnn-families-and-selection.md
NOT at:
skills/cnn-families-and-selection.md ← WRONG PATH
When you see a link like [cnn-families-and-selection.md](cnn-families-and-selection.md), read the file from the same directory as this SKILL.md.
Question to ask: "What type of data are you working with?"
| Data Type | Route To | Why |
|---|---|---|
| Images (photos, medical scans, etc.) | cnn-families-and-selection.md | CNNs excel at spatial hierarchies |
| Sequences (time series, text, audio) | sequence-models-comparison.md | Temporal dependencies need sequential models |
| Graphs (social networks, molecules) | graph-neural-networks-basics.md | Graph structure requires GNNs |
| Generation task (create images, text) | generative-model-families.md | Generative models are specialized |
| Multiple modalities (text + images) | architecture-design-principles.md | Need custom design |
| Unclear / Generic | architecture-design-principles.md | Start with fundamentals |
If any of these apply, address FIRST:
| Requirement | Route To | Priority |
|---|---|---|
| Deep network (> 20 layers) unstable | normalization-techniques.md | CRITICAL - fix before continuing |
| Need attention mechanisms | attention-mechanisms-catalog.md | Specialized component |
| Custom architecture design | architecture-design-principles.md | Foundation before specifics |
| Transformer-specific question | transformer-architecture-deepdive.md | Specialized architecture |
Clarify BEFORE routing:
Ask:
These answers determine architecture appropriateness.
Symptoms triggering this route:
Route to: See cnn-families-and-selection.md for CNN architecture selection and comparison.
When to route here:
Clarifying questions:
Symptoms triggering this route:
Route to: See sequence-models-comparison.md for sequential model selection (RNN, LSTM, Transformer, TCN).
When to route here:
Clarifying questions:
CRITICAL: Challenge "RNN vs LSTM" premise if they ask. Modern alternatives (Transformers, TCN) often better.
Symptoms triggering this route:
Route to: See graph-neural-networks-basics.md for GNN architectures and graph learning.
When to route here:
Red flag: If treating graph as tabular data (extracting features and ignoring edges) → WRONG. Route to GNN skill.
Symptoms triggering this route:
Route to: See generative-model-families.md for GANs, VAEs, and Diffusion models.
When to route here:
Clarifying questions:
CRITICAL: Different generative models have VERY different trade-offs. Must clarify requirements.
Symptoms triggering this route:
Route to: See attention-mechanisms-catalog.md for attention mechanism selection and design.
When to route here:
NOT for: General Transformer questions → transformer-architecture-deepdive.md instead
Symptoms triggering this route:
Route to: See transformer-architecture-deepdive.md for Transformer internals and implementation.
When to route here:
Cross-reference:
yzmir/llm-specialist/transformer-for-llms (LLM-specific transformers)Symptoms triggering this route:
Route to: See normalization-techniques.md for deep network stability and normalization methods.
When to route here:
CRITICAL: This is often the ROOT CAUSE of "training won't work" - fix architecture before blaming hyperparameters.
Symptoms triggering this route:
Route to: See architecture-design-principles.md for custom architecture design fundamentals.
When to route here:
This is the foundational skill - route here if other specific skills don't match.
Example: "Text + image classification" (multimodal)
Route to BOTH:
Order matters: Understand individual modalities BEFORE fusion.
Example: "Select architecture AND optimize training"
Route order:
Why: Wrong architecture can't be fixed by better training.
Example: "Select architecture AND deploy efficiently"
Route order:
Deployment constraints might influence architecture choice - if so, note constraints during architecture selection.
| Symptom | Wrong Route | Correct Route | Why |
|---|---|---|---|
| "My transformer won't train" | transformer-architecture-deepdive.md | training-optimization | Training issue, not architecture understanding |
| "Deploy image classifier" | cnn-families-and-selection.md | ml-production | Deployment, not selection |
| "ViT vs ResNet for medical imaging" | transformer-architecture-deepdive.md | cnn-families-and-selection.md | Comparative selection, not single architecture detail |
| "Implement BatchNorm in PyTorch" | normalization-techniques.md | pytorch-engineering | Implementation, not architecture concept |
| "GAN won't converge" | generative-model-families.md | training-optimization | Training stability, not architecture selection |
| "Which optimizer for CNN" | cnn-families-and-selection.md | training-optimization | Optimization, not architecture |
Rule: Architecture pack is for CHOOSING and DESIGNING architectures. Training/deployment/implementation are other packs.
If query contains these patterns, ASK clarifying questions before routing:
| Pattern | Why Clarify | What to Ask |
|---|---|---|
| "Best architecture for X" | "Best" depends on constraints | "What are your data size, compute, and latency constraints?" |
| Generic problem description | Can't route without modality | "What type of data? (images, sequences, graphs, etc.)" |
| Latest trend mentioned (ViT, Diffusion) | Recency bias risk | "Have you considered alternatives? What are your specific requirements?" |
| "Should I use X or Y" | May be wrong question | "What's the underlying problem? There might be option Z." |
| Very deep network (> 50 layers) | Likely needs normalization first | "Are you using normalization layers? Skip connections?" |
Never guess modality or constraints. Always clarify.
| Trendy Architecture | When NOT to Use | Better Alternative |
|---|---|---|
| Vision Transformers (ViT) | Small datasets (< 10k images) | CNNs (ResNet, EfficientNet) |
| Vision Transformers (ViT) | Edge deployment (latency/power) | EfficientNets, MobileNets |
| Transformers (general) | Very small datasets | RNNs, CNNs (less capacity, less overfit) |
| Diffusion Models | Real-time generation needed | GANs (1 forward pass vs 50-1000 steps) |
| Diffusion Models | Limited compute for training | VAEs (faster training) |
| Graph Transformers | Small graphs (< 100 nodes) | Standard GNNs (GCN, GAT) simpler and effective |
| LLMs (GPT-style) | < 1M tokens of training data | Simpler language models or fine-tuning |
Counter-narrative: "New architecture ≠ better for your use case. Match architecture to constraints."
Start here: What's your primary goal?
┌─ SELECT architecture for task
│ ├─ Data modality?
│ │ ├─ Images → [cnn-families-and-selection.md](cnn-families-and-selection.md)
│ │ ├─ Sequences → [sequence-models-comparison.md](sequence-models-comparison.md)
│ │ ├─ Graphs → [graph-neural-networks-basics.md](graph-neural-networks-basics.md)
│ │ ├─ Generation → [generative-model-families.md](generative-model-families.md)
│ │ └─ Unknown/Multiple → [architecture-design-principles.md](architecture-design-principles.md)
│ └─ Special requirements?
│ ├─ Deep network (>20 layers) unstable → [normalization-techniques.md](normalization-techniques.md) (CRITICAL)
│ ├─ Need attention mechanism → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│ └─ None → Proceed with modality-based route
│
├─ UNDERSTAND specific architecture
│ ├─ Transformers → [transformer-architecture-deepdive.md](transformer-architecture-deepdive.md)
│ ├─ Attention → [attention-mechanisms-catalog.md](attention-mechanisms-catalog.md)
│ ├─ Normalization → [normalization-techniques.md](normalization-techniques.md)
│ └─ General principles → [architecture-design-principles.md](architecture-design-principles.md)
│
├─ DESIGN custom architecture
│ └─ [architecture-design-principles.md](architecture-design-principles.md) (start here always)
│
└─ COMPARE architectures
├─ CNNs (ResNet vs EfficientNet) → [cnn-families-and-selection.md](cnn-families-and-selection.md)
├─ Sequence models (RNN vs Transformer) → [sequence-models-comparison.md](sequence-models-comparison.md)
├─ Generative (GAN vs Diffusion) → [generative-model-families.md](generative-model-families.md)
└─ General comparison → [architecture-design-principles.md](architecture-design-principles.md)
| Rationalization | Reality | Counter |
|---|---|---|
| "Transformers are SOTA, recommend them" | SOTA on benchmark ≠ best for user's constraints | "Ask about dataset size and compute first" |
| "User said RNN vs LSTM, answer that" | Question premise might be outdated | "Challenge: Have you considered Transformers or TCN?" |
| "Just recommend latest architecture" | Latest ≠ appropriate | "Match architecture to requirements, not trends" |
| "Architecture doesn't matter, training matters" | Wrong architecture can't be fixed by training | "Architecture is foundation - get it right first" |
| "They seem rushed, skip clarification" | Wrong route wastes more time than clarification | "30 seconds to clarify saves hours of wasted effort" |
| "Generic architecture advice is safe" | Generic = useless for specific domains | "Route to domain-specific skill for actionable guidance" |
Once architecture is chosen, route to:
Training the architecture:
→ yzmir/training-optimization/using-training-optimization
Implementing in PyTorch:
→ yzmir/pytorch-engineering/using-pytorch-engineering
Deploying to production:
→ yzmir/ml-production/using-ml-production
Dynamic/growing architectures:
→ yzmir/dynamic-architectures/using-dynamic-architectures
If problem involves:
Reinforcement learning:
→ yzmir/deep-rl/using-deep-rl FIRST
Large language models:
→ yzmir/llm-specialist/using-llm-specialist FIRST
Architecture is downstream of algorithm choice in RL and LLMs.
Use this meta-skill to:
After routing, load the appropriate specialist skill for detailed guidance:
Critical principle: Architecture comes BEFORE training. Get this right first.