Hello,
I'm trying to determine the best/recommended AVAudioSession configuration (i.e category, mode, and options) for the following use-case.
Essentially, I'd like to switch between periods of playing an audio file and then recognizing speech. The audio file is typically speech and I don't intend for playback and speech recognition to occur simultaneously. I'd like for the user to sill be able to interact with Siri and I'd like for it to work with CarPlay where navigation prompts can occur.
I would assume the category to use is 'playAndRecord', but I'm not sure if it's better to just set that once for the entire lifecycle, or set to 'playback' for audio file playback and then switch to 'playAndRecord' for speech recognition . I'm also not sure on the best 'mode' and 'options' to set. Any suggestions would be appreciated.
Thanks.
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Is there a way to destroy MIDIUMPMutableEndpoint again?
In my app, the user has a setting to enable and disable MIDI 2.0. If MIDI 2.0 should not be supported (or if iOS version < 18), it creates a virtual destination and a virtual source. And if MIDI 2.0 should be enabled, it instead creates a MIDIUMPMutableEndpoint, which itself creates the virtual destination and source automatically.
So here is my problem: I didn't find any way to destroy the MIDIUMPMutableEndpoint again. There is a method to disable it (setEnabled:NO), but that doesn't destroy or hide the virtual destination and source. So when the user turns MIDI 2.0 support off, I will have two virtual destinations and sources, and cannot get rid of the 2.0 ones.
What is the correct way to get rid of the MIDIUMPMutableEndpoint once it is created?
I'm encountering errors while using AVAudioEngine with voice processing enabled (setVoiceProcessingEnabled(true)) in scenarios where the input and output audio devices are not the same. This issue arises specifically with mismatched devices, preventing the application from functioning as expected.
Works: Paired devices (e.g., MacBook Pro mic → MacBook Pro speakers) Fails: Mismatched devices (e.g., AirPods mic → MacBook Pro speakers)
When using paired input and output devices:
The setup works as expected. Example: MacBook Pro microphone → MacBook Pro speakers. When using mismatched devices:
AVAudioEngine setup fails during aggregate device construction. Example: AirPods microphone → MacBook Pro speakers. Error logs indicate a channel count mismatch.
Here are the partial logs. Due to the content limit, I cannot post the entire logs.
AUVPAggregate.cpp:1000 client-side input and output formats do not match (err=-10875)
AUVPAggregate.cpp:1036 err=-10875
AVAEInternal.h:109 [AVAudioEngineGraph.mm:1344:Initialize: (err = PerformCommand(*outputNode, kAUInitialize, NULL, 0)): error -10875
AggregateDevice.mm:329 Failed expectation of constructed aggregate (312): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (312): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
AggregateDevice.mm:182 error fetching default pair
AggregateDevice.mm:329 Failed expectation of constructed aggregate (336): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (336): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
AUHAL.cpp:1782 ca_verify_noerr: [AudioDeviceSetProperty(mDeviceID, NULL, 0, isInput, kAudioDevicePropertyIOProcStreamUsage, theSize, theStreamUsage), 560227702]
AudioHardware-mac-imp.cpp:3484 AudioDeviceSetProperty: no device with given ID
AUHAL.cpp:1782 ca_verify_noerr: [AudioDeviceSetProperty(mDeviceID, NULL, 0, isInput, kAudioDevicePropertyIOProcStreamUsage, theSize, theStreamUsage), 560227702]
AggregateDevice.mm:182 error fetching default pair
AggregateDevice.mm:329 Failed expectation of constructed aggregate (348): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (348): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
Is it possible to use voice processing with different input/output devices?
If yes, are there any specific configurations required to handle mismatched devices? How can we resolve channel count mismatch errors during aggregate device construction?
Are there settings or API adjustments to enforce compatibility between input/output devices? Are there any workarounds or alternative approaches to achieve voice processing functionality with mismatched devices?
For instance, can we force an intermediate channel configuration or downmix input/output formats?
Hey folks, I'm running into an odd issue suddenly with an app that had a working MusicKit integration before.
I'm using ApplicationMusicPlayer to play Apple Music albums and songs. I'm testing on a physical device, signed in to Apple ID, and with a valid subscription. Apple Music via the first-party app works entirely fine on this device.
Attempting to play back any content at all gives the log:
<ICUserIdentityStoreACAccountBackend: 0x1070bf3e0> Failed to initialize primary apple account, error=Error Domain=ICError Code=-7013 "Client is not entitled to access account store" UserInfo={NSDebugDescription=Client is not entitled to access account store}
[ICUserIdentityStore] - initializing account histories with activeAccountDSID = nil, activeLockerAccountDSID = nil, timestamp = 14605951908
[ICUserIdentityStore] Failed to fetch local store account with error: Error Domain=ICError Code=-7013 "Client is not entitled to access account store" UserInfo={NSDebugDescription=Client is not entitled to access account store}.
The album artwork, track names, etc, all appear in the control center playback controls, but the music doesn't play. Trying to trigger playback with control center just results in it skipping to the next track, which doesn't play either.
This exact code used to work. I have the MusicKit service selected in Apple Connect. Since this isn't entitlement-based, I'm not sure how else to check that I'm set up correctly.
I've tried deleting/reinstalling the app, restarting the device, cleaning/rebuilding, and deleting DerivedData, to no avail.
Any help?
Running Xcode 16.4 (16F6), testing on iOS 18.5 (22F76)
Hello!
I'm experiencing an issue with iOS's audio routing system when trying to use Bluetooth headphones for audio output while also recording environmental audio from the built-in microphone.
Desired behavior:
Play audio through Bluetooth headset (AirPods)
Record unprocessed environmental audio from the iPhone's built-in microphone
Actual behavior:
When explicitly selecting the built-in microphone, iOS reports it's using it (in currentRoute.inputs)
However, the actual audio data received is clearly still coming from the AirPods microphone
The audio is heavily processed with voice isolation/noise cancellation, removing environmental sounds
Environment Details
Device: iPhone 12 Pro Max
iOS Version: 18.4.1
Hardware: AirPods
Audio Framework: AVAudioEngine (also tried AudioQueue)
Code Attempted
I've tried multiple approaches to force the correct routing:
func configureAudioSession() {
let session = AVAudioSession.sharedInstance()
// Configure to allow Bluetooth output but use built-in mic
try? session.setCategory(.playAndRecord,
options: [.allowBluetoothA2DP, .defaultToSpeaker])
try? session.setActive(true)
// Explicitly select built-in microphone
if let inputs = session.availableInputs,
let builtInMic = inputs.first(where: { $0.portType == .builtInMic }) {
try? session.setPreferredInput(builtInMic)
print("Selected input: \(builtInMic.portName)")
}
// Log the current route
let route = session.currentRoute
print("Current input: \(route.inputs.first?.portName ?? "None")")
// Configure audio engine with native format
let inputNode = audioEngine.inputNode
let nativeFormat = inputNode.inputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: nativeFormat) { buffer, time in
// Process audio buffer
// Despite showing "Built-in Microphone" in route, audio appears to be
// coming from AirPods with voice isolation applied - welp!
}
try? audioEngine.start()
}
I've also tried various combinations of:
Different audio session modes (.default, .measurement, .voiceChat)
Different option combinations (with/without .allowBluetooth, .allowBluetoothA2DP)
Setting session.setPreferredInput() both before and after activation
Diagnostic Observations
When AirPods are connected:
AVAudioSession.currentRoute.inputs correctly shows "Built-in Microphone" after setPreferredInput()
The actual audio data received shows clear signs of AirPods' voice isolation processing
Background/environmental sounds are actively filtered out...
When recording a test audio played near the phone (not through the app), the recording is nearly silent. Only headset voice goes through.
Questions
Is there a workaround to force iOS to actually use the built-in microphone while maintaining Bluetooth output?
Are there any lower-level configurations that might resolve this issue?
Any insights, workarounds, or suggestions would be greatly appreciated. This is blocking a critical feature in my application that requires environmental audio recording while providing audio feedback through headphones 😅
Hello everyone,
I've written an audio unit plugin that needs to be aware of any upstream latency caused by heavy plugins before it on the channel. Is there any way to query this? I know that Logic applies PDC at the channel's output (summing point), but I need to know what the accumulated latency is at the point the audio enters my plugin. Thanks!
Topic:
Media Technologies
SubTopic:
Audio
Hello,
I've discovered a buffer initialization bug in AVAudioUnitSampler that happens when loading presets with multiple zones referencing different regions in the same audio file (monolith/concatenated samples approach).
Almost all zones output silence (i.e. zeros) at the beginning of playback instead of starting with actual audio data.
The Problem
Setup:
Single audio file (monolith) containing multiple concatenated samples
Multiple zones in an .aupreset, each with different sample start and sample end values pointing to different regions of the same file
All zones load successfully without errors
Expected Behavior:
All zones should play their respective audio regions immediately from the first sample.
Actual Behavior:
Last zone in the zone list: Works perfectly - plays audio immediately
All other zones: Output [0, 0, 0, 0, ..., _audio_data] instead of [real_audio_data]
The number of zeros varies from event to event for each zone. It can be a couple of samples (<30) up to several buffers.
After the initial zeros, the correct audio plays normally, so there is no shift in audio playback, just missing samples at the beginning.
Minimal Reproduction
1. Create Test Monolith Audio File
Create a single Wav file with 3 concatenated 1-second samples (44.1kHz):
Sample 1: frames 0-44099 (constant amplitude 0.3)
Sample 2: frames 44100-88199 (constant amplitude 0.6)
Sample 3: frames 88200-132299 (constant amplitude 0.9)
2. Create Test Preset
Create an .aupreset with 3 zones all referencing the same file:
Pseudo code
<Zone array>
<zone 1> start : 0, end: 44099, note: 60, waveform: ref_to_monolith.wav;
<zone 2> start sample: 44100, note: 62, end sample: 88199, waveform: ref_to_monolith.wav;
<zone 3> start sample: 88200, note: 64, end sample: 132299, waveform: ref_to_monolith.wav;
</Zone array>
3. Load and Test
// Load preset into AVAudioUnitSampler
let sampler = AVAudioUnitSampler()
try sampler.loadAudioFiles(from: presetURL)
// Play each zone (MIDI notes C4=60, D4=62, E4=64)
sampler.startNote(60, withVelocity: 64, onChannel: 0) // Zone 1
sampler.startNote(62, withVelocity: 64, onChannel: 0) // Zone 2
sampler.startNote(64, withVelocity: 64, onChannel: 0) // Zone 3
4. Observed Result
Zone 1 (C4): [0, 0, 0, ..., 0.3, 0.3, 0.3] ❌ Zeros at beginning
Zone 2 (D4): [0, 0, 0, ..., 0.6, 0.6, 0.6] ❌ Zeros at beginning
Zone 3 (E4): [0.9, 0.9, 0.9, ...] ✅ Works correctly (last zone)
What I've Extensively Tested
What DOES Work
Separate files per zone:
Each zone references its own individual audio file
All zones play correctly without zeros
Problem: Not viable for iOS apps with 500+ sample libraries due to file handle limitations
What DOESN'T Work (All Tested)
1. Different Audio Formats:
CAF (Float32 PCM, Int16 PCM, both interleaved and non-interleaved)
M4A (AAC compressed)
WAV (uncompressed)
SF2 (SoundFont2)
Bug persists across all formats
2. CAF Region Chunks:
Created CAF files with embedded region chunks defining zone boundaries
Set zones with no sampleStart/sampleEnd in preset (nil values)
AVAudioUnitSampler completely ignores CAF region metadata
Bug persists
3. Unique Waveform IDs:
Gave each zone a unique waveform ID (268435456, 268435457, 268435458)
Each ID has its own file reference entry (all pointing to same physical file)
Hypothesized this might trigger separate buffer initialization
Bug persists - no improvement
4. Different Sample Rates:
Tested: 44.1kHz, 48kHz, 96kHz
Bug occurs at all sample rates
5. Mono vs Stereo:
Bug occurs with both mono and stereo files
Environment
macOS: Sonoma 14.x (tested across multiple minor versions)
iOS: Tested on iOS 17.x with same results
Xcode: 16.x
Frameworks: AVFoundation, AudioToolbox
Reproducibility: 100% reproducible with setup described above
Impact & Use Case
This bug severely impacts professional music applications that need:
Small file sizes: Monolith files allow sharing compressed audio data (AAC/M4A)
iOS file handle limits: Opening 400+ individual sample files is not viable on iOS
Performance: Single file loading is much faster than hundreds of individual files
Standard industry practice: Monolith/concatenated samples are used by EXS24, Kontakt, and most professional samplers
Current Impact:
Cannot use monolith files with AVAudioUnitSampler on iOS
Forced to choose between: unusable audio (zeros at start) OR hitting iOS file limits
No viable workaround exists
Root Cause Hypothesis
The bug appears to be in AVAudioUnitSampler's internal buffer initialization when:
Multiple zones share the same source audio file
Each zone specifies different sampleStart/sampleEnd offsets
Key observation: The last zone in the zone array always works correctly.
This is NOT related to:
File permissions or security-scoped resources (separate files work fine)
Audio codec issues (happens with uncompressed PCM too)
Preset parsing (preset loads correctly, all zones are valid)
Questions
Is this a known issue? I couldn't find any documentation, bug reports, or discussions about this.
Is there ANY workaround that allows monolith files to work with AVAudioUnitSampler?
Alternative APIs? Is there a different API or approach for iOS that properly supports monolith sample files?
Hi everyone!
I’ve developed a location-based Audio AR app in Unity with FMOD & Resonance Audio and AirPods Pro Head-Tracking to create a ubiquitous augmented soundscape experience. Think of it as an audio version of Pokémon Go, but with a more precise location requirement to ensure spatial audio is placed correctly.
I want this experience to run in the background on iOS, but from what I’ve gathered, it seems Unity doesn’t support this well. So, I’m considering developing a Swift version instead.
Since this is primarily for research purposes, privacy concerns are not a major issue in my case. However, I’ve come across some potential challenges:
Real-time precise location updates – Can iOS provide fully instantaneous, high-accuracy location updates in the background?
Continuous real-time data processing – Can an app continuously process spatial audio, head-tracking, and location data while running in the background?
I’m not sure if newer iOS versions have improved in these areas or if there are workarounds to achieve this.
Would this kind of experience be feasible to run in the background on iOS? Any insights or pointers would be greatly appreciated!
I’m very new to iOS development, so apologies if this is a basic question. Thanks in advance!
I'm working on a project to support spatial audio editing, using this sample project as a reference: https://developer.apple.com/documentation/Cinematic/editing-spatial-audio-with-an-audio-mix
This sample works well on an unedited capture, but does not work for a capture that has already been edited.
The failure is occurring at "let audioInfo = try await CNAssetSpatialAudioInfo(asset: myAsset)", which is throwing "no eligible audio tracks in asset".
I also find that for already edited captures, if i use CNAssetSpatialAudioInfo.assetContainsSpatialAudio, it returns false.
What i mean by "already edited" is that if I take a spatial capture with my iPhone 16, and then edit that capture in the Photos app using the Cinematic effect, and then save the edited output (e.g. edited_capture.mov), I can't import that edited_capture.mov into my project as a spatial audio asset.
Is this intentional behavior or a bug?
If it's intentional, can you describe why?
Topic:
Media Technologies
SubTopic:
Audio
hi,
Is there an Audio Unit logo I can show on my website? I would love to show that my application is able to host Audio Unit plugins.
regards, Joël
I have an app that displays artwork via MPMediaItem.artwork, requesting an image with a specific size. How do I get a media item's MPMediaItemAnimatedArtwork, and how to get the preview image and video to display to the user?
Environment
Device: iPhone 16e
iOS Version: 18.4.1 - 18.7.1
Framework: AVFoundation (AVAudioEngine)
Problem Summary
On iPhone 16e (iOS 18.4.1-18.7.1), the installTap callback stops being invoked after resuming from a phone call interruption. This issue is specific to phone call interruptions and does not occur on iPhone 14, iPhone SE 3, or earlier devices.
Expected Behavior
After a phone call interruption ends and audioEngine.start() is called, the previously installed tap should continue receiving audio buffers.
Actual Behavior
After resuming from phone call interruption:
Tap callback is no longer invoked
No audio data is captured
No errors are thrown
Engine appears to be running normally
Note: Normal pause/resume (without phone call interruption) works correctly.
Steps to Reproduce
Start audio recording on iPhone 16e
Receive or make a phone call (triggers AVAudioSession interruption)
End the phone call
Resume recording with audioEngine.start()
Result: Tap callback is not invoked
Tested devices:
iPhone 16e (iOS 18.4.1-18.7.1): Issue reproduces ✗
iPhone 14 (iOS 18.x): Works correctly ✓
iPhone SE 3 (iOS 18.x): Works correctly ✓
Code
Initial Setup (Works)
let inputNode = audioEngine.inputNode
inputNode.installTap(onBus: 0, bufferSize: 4096, format: nil) { buffer, time in
self.processAudioBuffer(buffer, at: time)
}
audioEngine.prepare()
try audioEngine.start()
Interruption Handling
NotificationCenter.default.addObserver(
forName: AVAudioSession.interruptionNotification,
object: AVAudioSession.sharedInstance(),
queue: nil
) { notification in
guard let userInfo = notification.userInfo,
let typeValue = userInfo[AVAudioSessionInterruptionTypeKey] as? UInt,
let type = AVAudioSession.InterruptionType(rawValue: typeValue) else {
return
}
if type == .began {
self.audioEngine.pause()
} else if type == .ended {
try? self.audioSession.setActive(true)
try? self.audioEngine.start()
// Tap callback doesn't work after this on iPhone 16e
}
}
Workaround
Full engine restart is required on iPhone 16e:
func resumeAfterInterruption() {
audioEngine.stop()
inputNode.removeTap(onBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 4096, format: nil) { buffer, time in
self.processAudioBuffer(buffer, at: time)
}
audioEngine.prepare()
try audioSession.setActive(true)
try audioEngine.start()
}
This works but adds latency and complexity compared to simple resume.
Questions
Is this expected behavior on iPhone 16e?
What is the recommended way to handle phone call interruptions?
Why does this only affect iPhone 16e and not iPhone 14 or SE 3?
Any guidance would be appreciated!
According to the header file the outputVolume properties supported range is 0.0-1.0:
/*! @property outputVolume
@abstract The mixer's output volume.
@discussion
This accesses the mixer's output volume (0.0-1.0, inclusive).
@property (nonatomic) float outputVolume;
However when setting the volume to 2.0 the audio does indeed play louder. Is the header file out of date and if so, what is the supported range for outputVolume?
Thanks
Is there any way for me to use an AutoMix api in my IOS apps, I would play tracks using the Apple Music api and use AutoMix to attempt to merge tracks.
Is this feature/api available to developers.
Hello,
The search functionality of the coreaudio-api mailing list archive has been broken for a very long time. Several of the lower-level audio APIs have only been discussed on this mailing list, making it critical for those of us maintaining old audio code.
Steps to reproduce:
Open https://lists.apple.com/archives/list/coreaudio-api@lists.apple.com/ in your web browser.
Enter a search term in the "Search this list" field in the top-right corner of the page.
The search will eventually time out with "502 Bad Gateway"
Can somebody please forward this information to the current maintainer? I've tried to contact developer support but they weren't sure what to do.
Thanks!
Topic:
Media Technologies
SubTopic:
Audio
I bought two "Apple USB-C to Headphone Jack Adapters". Upon closer inspection, they seems to be of different generations:
The one with product ID 0x110a on top is working fine. The one with product ID 0x110b has two issues:
There is a short but loud click noise on the headphone when I connect it to the iPad.
When I play audio using AVAudioPlayer the first half of a second or so is cut off.
Here's how I'm playing the audio:
audioPlayer = try AVAudioPlayer(contentsOf: url)
audioPlayer?.delegate = self
audioPlayer?.prepareToPlay()
audioPlayer?.play()
Is this a known issue? Am I doing something wrong?
Hi everyone,
I’m trying to use AVAssetResourceLoaderDelegate to handle a live radio stream (e.g. Icecast/HTTP stream). My goal is to have access to the last 30 seconds of audio data during playback, so I can analyze it for specific audio patterns in near-real-time.
I’ve implemented a custom resource loader that works fine for podcasts and static files, where the file size and content length are known. However, for infinite live streams, my current implementation stops receiving new loading requests after the first one is served. As a result, the playback either stalls or fails to continue.
Has anyone successfully used AVAssetResourceLoaderDelegate with a continuous radio stream? Or maybe you can suggest betterapproach for buffering and analyzing live audio?
Any tips, examples, or advice would be appreciated. Thanks!
I am a graduate student conducting research in speech/audio signal processing and multimodal interaction.
Apple Vision Pro is widely recognized as a multimodal interactive system supporting voice, eye, and gesture inputs. However, I could not find detailed specifications or documentation about the audio input sampling rate used by the device’s built-in microphone array when capturing user audio.
Specifically, I would like to understand:
What is the default audio input sampling rate (e.g., 16 kHz, 44.1 kHz, 48 kHz, etc.) for the Vision Pro’s microphones?
When developing with visionOS / AVAudioSession / AVAudioEngine, is there a documented or recommended sampling rate for audio capture?
Are there any best practices or settings for enabling high-quality voice capture on Vision Pro (especially for voice research tasks)?
For context, my work involves voice processing, analysis, and possibly on-device real-time speech recognition. Any pointers to relevant APIs, documentation or examples (especially regarding audio capture buffer size or available formats on visionOS) would be very helpful.
Thank you in advance!
Best regards.
Hello.
My team and I think we have an issue where our app is asked to gracefully shutdown with a following SIGTERM. As we’ve learned, this is normally not an issue. However, it seems to also be happening while our app (an audio streamer) is actively playing in the background.
From our perspective, starting playback is indicating strong user intent. We understand that there can be extreme circumstances where the background audio needs to be killed, but should it be considered part of normal operation? We hope that’s not the case.
All we see in the logs is the graceful shutdown request. We can say with high certainty that it’s happening though, as we know that playback is running within 0.5 seconds of the crash, without any other tracked user interaction.
Can you verify if this is intended behavior, and if there’s something we can do about it from our end. From our logs it doesn’t look to be related to either memory usage within the app, or the system as a whole.
Best,
John
I’m running HomePod OS 26 on two HomePod minis and OS 18.6 on main HomePod (original)
I’ve enabled Crossfade in the Home app.
I’m playing Apple Music directly in the HomePod mini.
Crossfade just doesn’t work on any HomePod.
I can understand it not working on the HomePod - but why isn’t it working on the minis running OS 26?
I’ve tried disabling and enabling Crossfade, rebooting HomePods etc but nothing?!