From ios-swift-skills
Guides implementation of Apple's SpeechAnalyzer and SpeechTranscriber for on-device speech-to-text transcription in macOS 26+ and iOS 26+ apps, including best practices and setup sequences.
npx claudepluginhub fal3/claude-skills-collection --plugin ios-swift-skillsThis skill uses the workspace's default tool permissions.
This skill provides comprehensive guidance on implementing Apple's modern Speech framework introduced for macOS 26+ and iOS 26+. The framework features `SpeechAnalyzer` and `SpeechTranscriber` for on-device speech-to-text transcription, offering 2.2x faster performance than Whisper Large V3 Turbo with out-of-process execution and automatic system updates.
Transcribes speech to text on iOS with Speech framework: live mic via AVAudioEngine, audio files, on-device/server, authorization, SpeechAnalyzer (iOS 26+).
Transcribes audio to text locally using MLX Whisper on Apple Silicon Macs. Supports text output, SRT subtitles, translation, language hints without API keys.
Guides developers on building from source, configuring local/cloud ASR engines (Sherpa, Volcengine, Deepgram), troubleshooting, and extending Type4Me Swift macOS voice input app.
Share bugs, ideas, or general feedback.
This skill provides comprehensive guidance on implementing Apple's modern Speech framework introduced for macOS 26+ and iOS 26+. The framework features SpeechAnalyzer and SpeechTranscriber for on-device speech-to-text transcription, offering 2.2x faster performance than Whisper Large V3 Turbo with out-of-process execution and automatic system updates.
Always allocate locale after model download: Call transcriber.allocate(locale:) only after downloading assets via AssetInventory.assetInstallationRequest() to avoid "Cannot use modules with unallocated locales" errors.
Use proper audio format conversion: Convert audio buffers to the analyzer's required format using AVAudioConverter with primeMethod = .none to prevent timestamp drift.
Follow the complete setup sequence: Instantiate transcriber → download models → allocate locale → create analyzer → get best format → create AsyncStream → start analyzer → consume results. Skipping steps causes runtime errors.
Leverage out-of-process execution: The framework runs transcription in a separate process, eliminating memory limits in your app and providing automatic system updates without app redeployment.
Handle both volatile and final results: Process result.isFinal to distinguish between live transcription updates (volatile) and completed segments (final) for proper UI updates.
Check locale support before use: Use SpeechTranscriber.supportedLocales and installedLocales to verify language availability before creating transcriber instances.
Implement proper AsyncStream lifecycle management: Create input streams with AsyncStream<AnalyzerInput>.makeStream() and properly yield converted buffers to the continuation for reliable transcription.
Use structured concurrency patterns: Leverage async/await and actor isolation for thread-safe transcription state management and result processing.
Monitor asset installation progress: Track download progress for large language models to provide user feedback during initial setup.
Handle transcription errors gracefully: Catch and handle errors from analyzer.start() and result iteration to provide fallback behavior when transcription fails.
SpeechAnalyzer as the main coordinator, passing an array of modules like [SpeechTranscriber]SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:) before processingAnalyzerInput objects containing converted AVAudioPCMBuffer instancesAsyncSequence using for try await result in transcriber.resultsreportingOptions: [.volatileResults] for real-time live captioningattributeOptions: [.audioTimeRange] for precise word-level timing informationUser Prompt: "Show me how to set up SpeechAnalyzer with SpeechTranscriber for English transcription."
Expected Output:
import Speech
@MainActor
class TranscriptionManager {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var inputContinuation: AsyncStream<AnalyzerInput>.Continuation?
func setupTranscription() async throws {
let locale = Locale(identifier: "en-US")
// Step 1: Create transcriber
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange]
)
// Step 2: Download model if needed
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
try await request.downloadAndInstall()
}
// Step 3: CRITICAL - Allocate locale AFTER download
try await transcriber.allocate(locale: locale)
// Step 4: Create analyzer
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Step 5: Get best format
let format = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
print("Best format: \(format)")
// Step 6: Create AsyncStream for input
let (inputSequence, inputBuilder) = AsyncStream<AnalyzerInput>.makeStream()
self.inputContinuation = inputBuilder
// Step 7: Start analyzer
try await analyzer.start(inputSequence: inputSequence)
// Store references
self.transcriber = transcriber
self.analyzer = analyzer
// Step 8: Start consuming results
Task {
for try await result in transcriber.results {
if result.isFinal {
print("Final: \(result.transcription)")
} else {
print("Live: \(result.transcription)")
}
}
}
}
}
User Prompt: "How do I convert AVAudioPCMBuffer to the format required by SpeechAnalyzer?"
Expected Output:
import AVFoundation
import Speech
class BufferConverter {
private var converter: AVAudioConverter?
func convertBuffer(_ buffer: AVAudioPCMBuffer,
to format: AVAudioFormat) throws -> AVAudioPCMBuffer {
// Create converter if needed or format changed
if converter == nil || converter?.outputFormat != format {
converter = AVAudioConverter(from: buffer.format, to: format)
// CRITICAL: Prevents timestamp drift
converter?.primeMethod = .none
}
guard let converter = converter else {
throw ConversionError.converterCreationFailed
}
// Calculate output frame capacity
let ratio = format.sampleRate / buffer.format.sampleRate
let outputFrameCapacity = AVAudioFrameCount(ceil(Double(buffer.frameLength) * ratio))
guard let outputBuffer = AVAudioPCMBuffer(
pcmFormat: format,
frameCapacity: outputFrameCapacity
) else {
throw ConversionError.bufferAllocationFailed
}
var error: NSError?
let status = converter.convert(to: outputBuffer, error: &error) { inPackets, outStatus in
outStatus.pointee = .haveData
return buffer
}
guard status != .error, error == nil else {
throw ConversionError.conversionFailed(error)
}
return outputBuffer
}
enum ConversionError: Error {
case converterCreationFailed
case bufferAllocationFailed
case conversionFailed(Error?)
}
}
User Prompt: "Create a complete example that captures microphone audio and transcribes it in real-time."
Expected Output:
import AVFoundation
import Speech
import SwiftUI
@MainActor
@Observable
class LiveTranscriptionManager {
var liveText: String = ""
var finalText: String = ""
var isRecording = false
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var audioEngine: AVAudioEngine?
private var inputContinuation: AsyncStream<AnalyzerInput>.Continuation?
private var bufferConverter = BufferConverter()
private var analyzerFormat: AVAudioFormat?
func startTranscription() async throws {
let locale = Locale(identifier: "en-US")
// Setup transcriber and analyzer
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange]
)
// Download model if needed
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
try await request.downloadAndInstall()
}
// Allocate locale
try await transcriber.allocate(locale: locale)
// Create analyzer
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Get best format
let format = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
self.analyzerFormat = format
// Create input stream
let (inputSequence, inputBuilder) = AsyncStream<AnalyzerInput>.makeStream()
self.inputContinuation = inputBuilder
// Start analyzer
try await analyzer.start(inputSequence: inputSequence)
self.transcriber = transcriber
self.analyzer = analyzer
// Start consuming results
Task {
for try await result in transcriber.results {
if result.isFinal {
self.finalText += result.transcription + " "
self.liveText = ""
} else {
self.liveText = result.transcription
}
}
}
// Setup audio engine
try setupAudioEngine()
isRecording = true
}
private func setupAudioEngine() throws {
let audioEngine = AVAudioEngine()
let inputNode = audioEngine.inputNode
let inputFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 4096, format: inputFormat) { [weak self] buffer, time in
guard let self = self,
let analyzerFormat = self.analyzerFormat else { return }
do {
// Convert buffer to analyzer format
let convertedBuffer = try self.bufferConverter.convertBuffer(
buffer,
to: analyzerFormat
)
// Feed to analyzer
let input = AnalyzerInput(buffer: convertedBuffer)
self.inputContinuation?.yield(input)
} catch {
print("Buffer conversion error: \(error)")
}
}
audioEngine.prepare()
try audioEngine.start()
self.audioEngine = audioEngine
}
func stopTranscription() {
audioEngine?.stop()
audioEngine?.inputNode.removeTap(onBus: 0)
inputContinuation?.finish()
audioEngine = nil
analyzer = nil
transcriber = nil
isRecording = false
}
}
// SwiftUI View
struct LiveTranscriptionView: View {
@State private var manager = LiveTranscriptionManager()
var body: some View {
VStack(spacing: 20) {
Text("Live Transcription")
.font(.title)
ScrollView {
VStack(alignment: .leading, spacing: 10) {
if !manager.finalText.isEmpty {
Text(manager.finalText)
.foregroundColor(.primary)
}
if !manager.liveText.isEmpty {
Text(manager.liveText)
.foregroundColor(.secondary)
.italic()
}
}
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
}
.frame(maxHeight: 300)
.background(Color(.systemGray6))
.cornerRadius(10)
Button(manager.isRecording ? "Stop" : "Start Recording") {
Task {
if manager.isRecording {
manager.stopTranscription()
} else {
try? await manager.startTranscription()
}
}
}
.buttonStyle(.borderedProminent)
}
.padding()
}
}
User Prompt: "How do I check which languages are supported and download models for a specific locale?"
Expected Output:
import Speech
@MainActor
class LocaleManager {
func checkAndDownloadLocale(_ localeIdentifier: String) async throws -> Bool {
let locale = Locale(identifier: localeIdentifier)
// Check if locale is supported
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(locale) else {
print("Locale \(localeIdentifier) is not supported")
return false
}
// Check if already installed
let installed = await SpeechTranscriber.installedLocales
if installed.contains(locale) {
print("Locale \(localeIdentifier) is already installed")
return true
}
// Create temporary transcriber to check requirements
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [],
attributeOptions: []
)
// Request installation
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
print("Downloading models for \(localeIdentifier)...")
// Monitor progress (optional)
Task {
for try await progress in request.progress {
print("Download progress: \(progress.fractionCompleted * 100)%")
}
}
try await request.downloadAndInstall()
print("Installation complete")
return true
}
return true // Already available
}
func listAvailableLocales() async {
let supported = await SpeechTranscriber.supportedLocales
let installed = await SpeechTranscriber.installedLocales
print("Supported locales:")
for locale in supported {
let status = installed.contains(locale) ? "✓ Installed" : "⬇ Available"
print(" \(locale.identifier) - \(status)")
}
}
}
User Prompt: "What are common errors when using SpeechAnalyzer and how do I handle them?"
Expected Output:
import Speech
@MainActor
class RobustTranscriptionManager {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
enum TranscriptionError: Error {
case localeNotAllocated
case analyzerStartFailed
case modelNotDownloaded
case audioFormatMismatch
case transcriptionFailed(Error)
var userMessage: String {
switch self {
case .localeNotAllocated:
return "Language not initialized. Please try again."
case .analyzerStartFailed:
return "Failed to start transcription service."
case .modelNotDownloaded:
return "Language model needs to be downloaded first."
case .audioFormatMismatch:
return "Audio format is incompatible with transcription."
case .transcriptionFailed(let error):
return "Transcription error: \(error.localizedDescription)"
}
}
}
func safeSetupTranscription(locale: Locale) async throws {
do {
// Step 1: Verify locale support
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(locale) else {
throw TranscriptionError.modelNotDownloaded
}
// Step 2: Create transcriber
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange]
)
// Step 3: Download model with error handling
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
do {
try await request.downloadAndInstall()
} catch {
throw TranscriptionError.modelNotDownloaded
}
}
// Step 4: Allocate locale with retry
do {
try await transcriber.allocate(locale: locale)
} catch {
// Common error: "Cannot use modules with unallocated locales"
throw TranscriptionError.localeNotAllocated
}
// Step 5: Create analyzer
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Step 6: Get format and validate
let format = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
guard format.sampleRate > 0 else {
throw TranscriptionError.audioFormatMismatch
}
// Step 7: Create input stream
let (inputSequence, _) = AsyncStream<AnalyzerInput>.makeStream()
// Step 8: Start analyzer with error handling
do {
try await analyzer.start(inputSequence: inputSequence)
} catch {
throw TranscriptionError.analyzerStartFailed
}
self.transcriber = transcriber
self.analyzer = analyzer
// Step 9: Consume results with error handling
Task {
do {
for try await result in transcriber.results {
processResult(result)
}
} catch {
print("Result iteration error: \(error)")
throw TranscriptionError.transcriptionFailed(error)
}
}
} catch let error as TranscriptionError {
print("Transcription setup failed: \(error.userMessage)")
throw error
} catch {
print("Unexpected error: \(error)")
throw TranscriptionError.transcriptionFailed(error)
}
}
private func processResult(_ result: SpeechTranscriptionResult) {
// Process transcription result
if result.isFinal {
print("Final: \(result.transcription)")
} else {
print("Volatile: \(result.transcription)")
}
}
}