Help us improve
Share bugs, ideas, or general feedback.
From coreai-skills
Exports PyTorch models with coreai-torch, compiles with coreai-build, and runs on Apple silicon via Core AI runtime (Swift/Python).
npx claudepluginhub apple/coreai-models --plugin coreai-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/coreai-skills:working-with-coreaiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Deploy PyTorch models on Apple silicon: export with coreai-torch, compile with coreai-build, run with the Core AI runtime (Swift or Python).
Provides empirical rules for authoring PyTorch models targeting on-device execution on Apple platforms (Neural Engine, GPU). Covers op compatibility, BC1S layout, KV cache patterns, correctness testing via PSNR, and common debugging issues.
Guide for selecting and deploying on-device AI on Apple platforms: Foundation Models, Core ML, MLX Swift, and llama.cpp. Covers model conversion, quantization, structured output, and Neural Engine optimization.
Deploys ML models to edge devices using Google AI Edge Gallery, TensorFlow Lite, ONNX Runtime, and MediaPipe. Covers quantization (INT8/INT4), on-device LLM inference, hardware delegate selection (GPU/NPU/DSP), and performance benchmarking for mobile/IoT/embedded targets.
Share bugs, ideas, or general feedback.
Deploy PyTorch models on Apple silicon: export with coreai-torch, compile with coreai-build, run with the Core AI runtime (Swift or Python).
Related skills: Skill("coreai-skills:model-authoring") (Neural Engine and GPU authoring patterns, use when re-structuring model architecture) | Skill("coreai-skills:model-compression-exploration") (quantization/palettization sweeps — use when exploring compression tradeoffs)
The Core AI toolchain has extensive documentation. Use these as reference — do not read all pages upfront. Instead, consult the relevant docs when you need specifics about a particular step.
| Resource | What it covers | When to consult |
|---|---|---|
| coreai-torch | TorchConverter API, externalization, composite ops, custom lowerings, Metal kernels, debugging | Export questions, API details, custom op registration |
| CoreAI framework | AIModel, InferenceFunction, NDArray, specialization, caching | Swift runtime API, on-device integration |
| coreai-build (AOT compilation) | Ahead-of-time compilation flags and options | Compilation questions |
| coreai Python API | Python runtime: AIModel, InferenceFunction, NDArray, state management | Python runtime questions |
| coreai-models repo | Export recipes, Swift runtime utilities, reusable primitives | Export patterns, running models, reference implementations |
guidance.md | Platform and general guidance: use cases, model sizing, compression strategy | Resolving decisions around platform targeting, model sizing, and compression strategy |
The coreai-models repo is the canonical source for how to export and run models with Core AI. Before writing export code from scratch, always explore this repo — it has working export recipes for many model families, Swift and Python runtime utilities, and reusable primitives. If the user has a local clone, explore it. If not, suggest cloning it.
Explore these directories to find relevant patterns:
models/ — Per-model export recipes with READMEs and CLI commands for many popular model families (LLMs, vision, audio, diffusion).python/src/coreai_models/export/ — Export pipeline code covering macOS and iOS export paths, compression presets, and custom MLIR lowerings.swift/Sources/ — Runtime utilities for LLMs (engines, text generation, KV cache, sampling, decode loops), diffusion pipelines, object detection, image segmentation, and constrained decoding.The Core AI pipeline transforms a PyTorch model into an optimized on-device asset:
1. AUTHOR Re-structure model for target platform
→ Skill("coreai-skills:model-authoring")
2. COMPRESS Explore quantization/palettization tradeoffs
→ Skill("coreai-skills:model-compression-exploration")
3. EXPORT Convert PyTorch → AIProgram via TorchConverter
→ coreai-torch docs
4. COMPILE Ahead-of-time compilation for target platform
→ coreai-build CLI
5. RUN Load and run on device (Swift or Python)
→ CoreAI framework / coreai Python API
Steps 1 and 2 are optional — many models export directly without re-authoring or compression. Start with export, then add authoring or compression if needed (poor accuracy, poor performance, too large).
For models already in coreai-models, the export recipes handle all steps. Check the models/ directory first — if the user's model family is there, point them to the recipe.
import torch
from coreai_torch import TorchConverter, get_decomp_table
model = MyModel().eval()
ep = torch.export.export(model, args=(torch.randn(1, 3, 224, 224),))
ep = ep.run_decompositions(get_decomp_table())
program = (
TorchConverter()
.add_exported_program(ep, input_names=["image"], output_names=["logits"])
.to_coreai()
)
program.optimize()
program.save_asset("model.aimodel")
This is the simplest export pattern. Real models often need more — consult the coreai-torch docs and explore the export code in the coreai-models repo for patterns around:
add_pytorch_module() with externalize_modulesstate_namesTorchMetalKernel and register_torch_lowering()set_static_shape_config()Ahead of time (AOT) compilation of models can optionally be performed with:
xcrun coreai-build compile model.aimodel --platform iOS
Docs: Ahead-of-time compilation
import CoreAI
let model = try await AIModel(contentsOf: modelURL)
guard let fn = try model.loadFunction(named: "main") else { return }
var input = NDArray(shape: [1, 3, 224, 224], scalarType: .float32)
var view = input.mutableView(as: Float32.self)
// fill view with data...
var outputs = try await fn.run(inputs: ["image": input])
let result = outputs.remove("logits")?.ndArray
For LLMs, diffusion, and other complex models, explore the Swift runtime utilities in the coreai-models repo — they provide complete inference engines, decode loops, sampling, and KV cache management that handle the complexity beyond basic AIModel usage.
Docs: CoreAI framework
from coreai.runtime import AIModel, NDArray
import numpy as np
model = await AIModel.load("model.aimodel")
fn = model.load_function("main")
outputs = await fn(
{"image": NDArray(np.random.randn(1, 3, 224, 224).astype(np.float32))}
)
logits = outputs["logits"].numpy()
Docs: coreai Python API
Compare on-device outputs against PyTorch reference:
| Scenario | Expected PSNR | Investigate if below |
|---|---|---|
| float32 end-to-end | > 70 dB | 60 dB |
| fp16 on-device | > 50 dB | 40 dB |
| 4-bit palettized | ~40 dB | 30 dB |
When helping a user deploy a model they haven't deployed before, follow this protocol. If the user already knows what they want (specific export question, compilation flag, etc.), skip directly to the relevant section above.
Your first response is always a conversation. Even if the deployment path seems obvious, ask first.
Rules:
guidance.mdWhat to learn:
Example interaction:
User: I want to deploy my image classifier on iPhone.
Assistant: Got it — you're looking to run an image classification model on iPhone. A couple of questions to help me give you the best guidance:
- Roughly how large is the model (parameter count or file size)?
- Does it need to run in real-time (e.g., processing camera frames), or is batch/offline processing fine?
User: It's a ResNet-50, about 25M params. Needs to run at 30fps on camera frames.
Assistant: Perfect — a 25M param ResNet-50 for real-time camera on iPhone is a great fit for efficient on-device inference. Here's the plan...
models/ directory for a matching recipeguidance.md to resolve the deployment path — present the outcome, not the reasoningSkill("coreai-skills:model-authoring")Skill("coreai-skills:model-compression-exploration")