Use when implementing speech-to-text, live transcription, or audio transcription. Covers SpeechAnalyzer (iOS 26+), SpeechTranscriber, volatile/finalized results, AssetInventory model management, audio format handling.
/plugin marketplace add CharlesWiltgen/Axiom/plugin install axiom@axiom-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
SpeechAnalyzer is Apple's new speech-to-text API introduced in iOS 26. It powers Notes, Voice Memos, Journal, and Call Summarization. The on-device model is faster, more accurate, and better for long-form/distant audio than SFSpeechRecognizer.
Key principle: SpeechAnalyzer is modular—add transcription modules to an analysis session. Results stream asynchronously using Swift's AsyncSequence.
Need speech-to-text?
├─ iOS 26+ only?
│ └─ Yes → SpeechAnalyzer (preferred)
├─ Need iOS 10-25 support?
│ └─ Yes → SFSpeechRecognizer (or DictationTranscriber)
├─ Long-form audio (meetings, lectures)?
│ └─ Yes → SpeechAnalyzer
├─ Distant audio (across room)?
│ └─ Yes → SpeechAnalyzer
└─ Short dictation commands?
└─ Either works
SpeechAnalyzer advantages:
DictationTranscriber (iOS 26+): Same languages as SFSpeechRecognizer, but doesn't require user to enable Siri/dictation in Settings.
Use this skill when you see:
Transcribe an audio file to text in one function.
import Speech
func transcribe(file: URL, locale: Locale) async throws -> AttributedString {
// Set up transcriber
let transcriber = SpeechTranscriber(
locale: locale,
preset: .offlineTranscription
)
// Collect results asynchronously
async let transcriptionFuture = try transcriber.results
.reduce(AttributedString()) { str, result in
str + result.text
}
// Set up analyzer with transcriber module
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Analyze the file
if let lastSample = try await analyzer.analyzeSequence(from: file) {
try await analyzer.finalizeAndFinish(through: lastSample)
} else {
await analyzer.cancelAndFinishNow()
}
return try await transcriptionFuture
}
Key points:
analyzeSequence(from:) reads file and feeds audio to analyzerfinalizeAndFinish(through:) ensures all results are finalizedAttributedString with timing metadataFor real-time transcription from microphone.
import Speech
class TranscriptionManager: ObservableObject {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var analyzerFormat: AudioFormatDescription?
private var inputBuilder: AsyncStream<AnalyzerInput>.Continuation?
@Published var finalizedTranscript = AttributedString()
@Published var volatileTranscript = AttributedString()
func setUp() async throws {
// Create transcriber with options
transcriber = SpeechTranscriber(
locale: Locale.current,
transcriptionOptions: [],
reportingOptions: [.volatileResults], // Enable real-time updates
attributeOptions: [.audioTimeRange] // Include timing
)
guard let transcriber else { throw TranscriptionError.setupFailed }
// Create analyzer with transcriber module
analyzer = SpeechAnalyzer(modules: [transcriber])
// Get required audio format
analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
// Ensure model is available
try await ensureModel(for: transcriber)
// Create input stream
let (stream, builder) = AsyncStream<AnalyzerInput>.makeStream()
inputBuilder = builder
// Start analyzer
try await analyzer?.start(inputSequence: stream)
}
}
func ensureModel(for transcriber: SpeechTranscriber) async throws {
let locale = Locale.current
// Check if language is supported
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) else {
throw TranscriptionError.localeNotSupported
}
// Check if model is installed
let installed = await SpeechTranscriber.installedLocales
if installed.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) {
return // Already installed
}
// Download model
if let downloader = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
// Track progress if needed
let progress = downloader.progress
try await downloader.downloadAndInstall()
}
}
Note: Models are stored in system storage, not app storage. Limited number of languages can be allocated at once.
func startResultHandling() {
Task {
guard let transcriber else { return }
do {
for try await result in transcriber.results {
let text = result.text
if result.isFinal {
// Finalized result - won't change
finalizedTranscript += text
volatileTranscript = AttributedString()
// Access timing info
for run in text.runs {
if let timeRange = run.audioTimeRange {
print("Time: \(timeRange)")
}
}
} else {
// Volatile result - will be replaced
volatileTranscript = text
}
}
} catch {
print("Transcription failed: \(error)")
}
}
}
Connect AVAudioEngine to SpeechAnalyzer.
import AVFoundation
class AudioRecorder {
private let audioEngine = AVAudioEngine()
private var outputContinuation: AsyncStream<AVAudioPCMBuffer>.Continuation?
private let transcriptionManager: TranscriptionManager
func startRecording() async throws {
// Request permission
guard await AVAudioApplication.requestRecordPermission() else {
throw RecordingError.permissionDenied
}
// Configure audio session (iOS)
#if os(iOS)
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playAndRecord, mode: .spokenAudio)
try session.setActive(true, options: .notifyOthersOnDeactivation)
#endif
// Set up transcriber
try await transcriptionManager.setUp()
transcriptionManager.startResultHandling()
// Stream audio to transcriber
for await buffer in try audioStream() {
try await transcriptionManager.streamAudio(buffer)
}
}
private func audioStream() throws -> AsyncStream<AVAudioPCMBuffer> {
let inputNode = audioEngine.inputNode
let format = inputNode.outputFormat(forBus: 0)
inputNode.installTap(
onBus: 0,
bufferSize: 4096,
format: format
) { [weak self] buffer, time in
self?.outputContinuation?.yield(buffer)
}
audioEngine.prepare()
try audioEngine.start()
return AsyncStream { continuation in
outputContinuation = continuation
}
}
}
extension TranscriptionManager {
private var converter: AVAudioConverter?
func streamAudio(_ buffer: AVAudioPCMBuffer) async throws {
guard let inputBuilder, let analyzerFormat else {
throw TranscriptionError.notSetUp
}
// Convert to analyzer's required format
let converted = try convertBuffer(buffer, to: analyzerFormat)
// Send to analyzer
let input = AnalyzerInput(buffer: converted)
inputBuilder.yield(input)
}
private func convertBuffer(
_ buffer: AVAudioPCMBuffer,
to format: AudioFormatDescription
) throws -> AVAudioPCMBuffer {
// Lazy initialize converter
if converter == nil {
let sourceFormat = buffer.format
let destFormat = AVAudioFormat(cmAudioFormatDescription: format)!
converter = AVAudioConverter(from: sourceFormat, to: destFormat)
}
guard let converter else {
throw TranscriptionError.conversionFailed
}
let outputBuffer = AVAudioPCMBuffer(
pcmFormat: converter.outputFormat,
frameCapacity: buffer.frameLength
)!
try converter.convert(to: outputBuffer, from: buffer)
return outputBuffer
}
}
Properly finalize to get remaining volatile results as finalized.
func stopRecording() async {
// Stop audio
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
outputContinuation?.finish()
// Finalize transcription (converts remaining volatile to final)
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
// Cancel any pending tasks
recognizerTask?.cancel()
}
Critical: Always call finalizeAndFinishThroughEndOfInput() to ensure volatile results are finalized.
// Languages the API supports
let supported = await SpeechTranscriber.supportedLocales
// Languages currently installed on device
let installed = await SpeechTranscriber.installedLocales
Limited number of languages can be allocated. Deallocate unused ones.
func deallocateLanguages() async {
let allocated = await AssetInventory.allocatedLocales
for locale in allocated {
await AssetInventory.deallocate(locale: locale)
}
}
Highlight text during audio playback using timing metadata.
struct TranscriptView: View {
let transcript: AttributedString
@Binding var playbackTime: CMTime
var body: some View {
Text(highlightedTranscript)
}
var highlightedTranscript: AttributedString {
var result = transcript
for (range, run) in transcript.runs {
guard let timeRange = run.audioTimeRange else { continue }
let isActive = timeRange.containsTime(playbackTime)
if isActive {
result[range].backgroundColor = .yellow
}
}
return result
}
}
// BAD - volatile results lost
func stopRecording() {
audioEngine.stop()
// Missing finalize!
}
// GOOD - volatile results become finalized
func stopRecording() async {
audioEngine.stop()
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
}
// BAD - format mismatch may fail silently
inputBuilder.yield(AnalyzerInput(buffer: rawBuffer))
// GOOD - convert to analyzer's format
let format = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])
let converted = try convertBuffer(rawBuffer, to: format)
inputBuilder.yield(AnalyzerInput(buffer: converted))
// BAD - may crash if model not installed
let transcriber = SpeechTranscriber(locale: locale, ...)
// Start using immediately
// GOOD - ensure model is ready
let transcriber = SpeechTranscriber(locale: locale, ...)
try await ensureModel(for: transcriber)
// Now safe to use
| Preset | Use Case |
|---|---|
.offlineTranscription | File transcription, no real-time feedback needed |
.progressiveLiveTranscription | Live transcription with volatile updates |
.volatileResults: Enable real-time approximate results.audioTimeRange: Include CMTimeRange for each text segment| Platform | SpeechTranscriber | DictationTranscriber |
|---|---|---|
| iOS 26+ | Yes | Yes |
| macOS Tahoe+ | Yes | Yes |
| watchOS 26+ | No | Yes |
| tvOS 26+ | TBD | TBD |
Hardware requirements: Varies by device. Use supportedLocales to check.
Combine with Foundation Models for summarization:
import FoundationModels
func generateTitle(for transcript: String) async throws -> String {
let session = LanguageModelSession()
let prompt = "Generate a short, clever title for this story: \(transcript)"
let response = try await session.respond(to: prompt)
return response.content
}
See axiom-ios-ai skill for Foundation Models details.
Before shipping speech-to-text:
supportedLocalesAssetInventory.assetInstallationRequestbestAvailableAudioFormat.volatileResults for live transcriptionfinalizeAndFinishThroughEndOfInput() on stop.audioTimeRange if neededWWDC: 2025-277
Docs: /speech, /speech/speechanalyzer, /speech/speechtranscriber
Skills: coreml (on-device ML), axiom-ios-ai (Foundation Models)
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.