Is there a way to destroy MIDIUMPMutableEndpoint again?
In my app, the user has a setting to enable and disable MIDI 2.0. If MIDI 2.0 should not be supported (or if iOS version < 18), it creates a virtual destination and a virtual source. And if MIDI 2.0 should be enabled, it instead creates a MIDIUMPMutableEndpoint, which itself creates the virtual destination and source automatically.
So here is my problem: I didn't find any way to destroy the MIDIUMPMutableEndpoint again. There is a method to disable it (setEnabled:NO), but that doesn't destroy or hide the virtual destination and source. So when the user turns MIDI 2.0 support off, I will have two virtual destinations and sources, and cannot get rid of the 2.0 ones.
What is the correct way to get rid of the MIDIUMPMutableEndpoint once it is created?
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Your draft looks great! Here's a refined version with the iOS 17 comparison emphasized and slightly better flow:
Hi Apple Engineers and fellow developers,
I'm experiencing a critical regression with ShazamKit's background operation on iOS 18. ShazamKit's SHManagedSession stops identifying songs in the background after approximately 20 seconds on iOS 18, while the exact same code works perfectly on iOS 17.
The behavior is consistent: the app works perfectly in the foreground, but when backgrounded or device is locked, it initially works for about 20 seconds then stops identifying new songs. The microphone indicator remains active suggesting audio access is maintained, but ShazamKit doesn't send identified songs in the background until you open the app again. Detection immediately resumes when bringing the app to foreground.
My technical setup uses SHManagedSession for continuous matching with background modes properly configured in Info.plist including audio mode, and Background App Refresh enabled. I've tested this on physical devices running iOS 18.0 through 18.5 with the same results across all versions. The exact same code running on iOS 17 devices works flawlessly in the background.
To reproduce: initialize SHManagedSession and start matching, begin song identification in foreground, background the app or lock device, play different songs which are initially detected for about 20 seconds, then after the timeout period new songs are no longer identified until you bring the app to foreground.
This regression has impacted my production app as users who rely on continuous background music identification are experiencing a broken feature. I submitted this as Feedback ID FB15255903 last September with no solution so far.
I've created a minimal demo project that reproduces this issue: https://github.com/tfmart/ShazamKitBackground
Has anyone else experienced this ShazamKit background regression on iOS 18? Are there any known workarounds or alternative approaches? Given the time this issue has persisted, could we please get acknowledgment of this regression, expected timeline for a fix, or any recommended workarounds?
Testing environment is Xcode 16.0+ on iOS 18.0-18.5 across multiple physical device models.
Any guidance would be greatly appreciated.
I am trying to get MIDI output from the AU Host demo app using the recent MIDI processor example. The processor works correctly in Logic Pro, but I cannot send MIDI from the AUv3 extension in standalone mode using the default host app to another program (e.g., Ableton).
The MIDI manager, which is part of the standalone host app, works fine, and I can send MIDI using it directly—Ableton receives it without issues. I have already set the midiOutputNames in the extension, and the midiOutBlock is mapped. However, the MIDI data from the AUv3 extension does not reach Ableton in standalone mode. I suspect the issue is that midiOutBlock might never be called in the plugin, or perhaps an input to the plugin is missing, which prevents it from sending MIDI. I am currently using the default routing.
I have modified the MIDI manager such that it works well as described above. Here is a part of my code for SimplePlayEngine.swift and my MIDIManager.swift for reference:
@MainActor
@Observable
public class SimplePlayEngine {
private let midiOutBlock: AUMIDIOutputEventBlock = { sampleTime, cable, length, data in return noErr }
var scheduleMIDIEventListBlock: AUMIDIEventListBlock? = nil
public init() {
engine.attach(player)
engine.prepare()
setupMIDI()
}
private func setupMIDI() {
if !MIDIManager.shared.setupPort(midiProtocol: MIDIProtocolID._2_0, receiveBlock: { [weak self] eventList, _ in
if let scheduleMIDIEventListBlock = self?.scheduleMIDIEventListBlock {
_ = scheduleMIDIEventListBlock(AUEventSampleTimeImmediate, 0, eventList)
}
}) {
fatalError("Failed to setup Core MIDI")
}
}
func initComponent(type: String, subType: String, manufacturer: String) async -> ViewController? {
reset()
guard let component = AVAudioUnit.findComponent(type: type, subType: subType, manufacturer: manufacturer) else {
fatalError("Failed to find component with type: \(type), subtype: \(subType), manufacturer: \(manufacturer))" )
}
do {
let audioUnit = try await AVAudioUnit.instantiate(
with: component.audioComponentDescription, options: AudioComponentInstantiationOptions.loadOutOfProcess)
self.avAudioUnit = audioUnit
self.connect(avAudioUnit: audioUnit)
return await audioUnit.loadAudioUnitViewController()
} catch {
return nil
}
}
private func startPlayingInternal() {
guard let avAudioUnit = self.avAudioUnit else { return }
setSessionActive(true)
if avAudioUnit.wantsAudioInput { scheduleEffectLoop() }
let hardwareFormat = engine.outputNode.outputFormat(forBus: 0)
engine.connect(engine.mainMixerNode, to: engine.outputNode, format: hardwareFormat)
do { try engine.start() } catch {
isPlaying = false
fatalError("Could not start engine. error: \(error).")
}
if avAudioUnit.wantsAudioInput { player.play() }
isPlaying = true
}
private func resetAudioLoop() {
guard let avAudioUnit = self.avAudioUnit else { return }
if avAudioUnit.wantsAudioInput {
guard let format = file?.processingFormat else { fatalError("No AVAudioFile defined.") }
engine.connect(player, to: engine.mainMixerNode, format: format)
}
}
public func connect(avAudioUnit: AVAudioUnit?, completion: @escaping (() -> Void) = {}) {
guard let avAudioUnit = self.avAudioUnit else { return }
engine.disconnectNodeInput(engine.mainMixerNode)
resetAudioLoop()
engine.detach(avAudioUnit)
func rewiringComplete() {
scheduleMIDIEventListBlock = auAudioUnit.scheduleMIDIEventListBlock
if isPlaying { player.play() }
completion()
}
let hardwareFormat = engine.outputNode.outputFormat(forBus: 0)
engine.connect(engine.mainMixerNode, to: engine.outputNode, format: hardwareFormat)
if isPlaying { player.pause() }
let auAudioUnit = avAudioUnit.auAudioUnit
if !auAudioUnit.midiOutputNames.isEmpty { auAudioUnit.midiOutputEventBlock = midiOutBlock }
engine.attach(avAudioUnit)
if avAudioUnit.wantsAudioInput {
engine.disconnectNodeInput(engine.mainMixerNode)
if let format = file?.processingFormat {
engine.connect(player, to: avAudioUnit, format: format)
engine.connect(avAudioUnit, to: engine.mainMixerNode, format: format)
}
} else {
let stereoFormat = AVAudioFormat(standardFormatWithSampleRate: hardwareFormat.sampleRate, channels: 2)
engine.connect(avAudioUnit, to: engine.mainMixerNode, format: stereoFormat)
}
rewiringComplete()
}
}
and my MIDI Manager
@MainActor
class MIDIManager: Identifiable, ObservableObject {
func setupPort(midiProtocol: MIDIProtocolID,
receiveBlock: @escaping @Sendable MIDIReceiveBlock) -> Bool {
guard setupClient() else { return false }
if MIDIInputPortCreateWithProtocol(client, portName, midiProtocol, &port, receiveBlock) != noErr {
return false
}
for source in self.sources {
if MIDIPortConnectSource(port, source, nil) != noErr {
print("Failed to connect to source \(source)")
return false
}
}
setupVirtualMIDIOutput()
return true
}
private func setupVirtualMIDIOutput() {
let virtualStatus = MIDISourceCreate(client, virtualSourceName, &virtualSource)
if virtualStatus != noErr {
print("❌ Failed to create virtual MIDI source: \(virtualStatus)")
} else {
print("✅ Created virtual MIDI source: \(virtualSourceName)")
}
}
func sendMIDIData(_ data: [UInt8]) {
print("hey")
var packetList = MIDIPacketList()
withUnsafeMutablePointer(to: &packetList) { ptr in
let pkt = MIDIPacketListInit(ptr)
_ = MIDIPacketListAdd(ptr, 1024, pkt, 0, data.count, data)
if virtualSource != 0 {
let status = MIDIReceived(virtualSource, ptr)
if status != noErr {
print("❌ Failed to send MIDI data: \(status)")
} else {
print("✅ Sent MIDI data: \(data)")
}
}
}
}
}
Hi all,
I’ve implemented the new Core Audio Tap API (AudioHardwareCreateProcessTap with CATapDescription) and I’m seeing consistent level attenuation that scales with the number of stereo output pairs exposed by the target device.
What I observe
Device with 4 stereo pairs (8 outs) → tap shows −12.04 dB relative to source.
True 2-ch devices (built-in speakers, AirPods) → ~0 dB attenuation.
The attenuation appears regardless of whether I:
Create a global (default-output) tap via initStereoGlobalTapButExcludeProcesses:
Or create a per-process/per-device tap via initWithProcesses:andDeviceUID:withStream:
Additionally, the routing choice inside the sending app matters:
App output to “System/Default Output” → I often see no attenuation.
App output directly to a multi-out interface (e.g., RME Fireface) → I see the pair-count-scaled attenuation.
I can query Core Audio for the number of output channels/pairs and gain-compensate (+20·log10(N_pairs) dB) and that matches my measurements for many cases. However, this compensation is not universally correct because it seems to depend on where each process routes its audio (Default Output vs. direct device), even when those processes are included in the same tap aggregate.
Question
Is there a supported way to obtain the raw, unattenuated streams for all processes through the Tap API—i.e., to bypass this automatic headroom/attenuation behavior entirely? If this attenuation is expected by design:
Is there a documented rule for when it applies (global vs. device taps, per-process taps, stream selection, etc.)?
Is there a property/flag to disable it, or a reliable, official method to compute the exact compensation (beyond counting stereo pairs)?
Any guidance on ensuring consistent levels when multiple processes route differently (Default Output vs. direct device) but are captured by the same tap?
Environment
API: AudioHardwareCreateProcessTap + CATapDescription
Devices: built-in output (2-ch), RME Fireface (8+ outs / 4+ stereo pairs)
Behavior reproducible with both global and per-process/per-device tap descriptions.
Attenuation example: 4 stereo pairs → −12.04 dB observed.
Happy to provide a minimal sample, measurements, and device logs. Thanks!
— David
Session player regions populate blank, with no sound media when tracks or regions are created.
I bought two "Apple USB-C to Headphone Jack Adapters". Upon closer inspection, they seems to be of different generations:
The one with product ID 0x110a on top is working fine. The one with product ID 0x110b has two issues:
There is a short but loud click noise on the headphone when I connect it to the iPad.
When I play audio using AVAudioPlayer the first half of a second or so is cut off.
Here's how I'm playing the audio:
audioPlayer = try AVAudioPlayer(contentsOf: url)
audioPlayer?.delegate = self
audioPlayer?.prepareToPlay()
audioPlayer?.play()
Is this a known issue? Am I doing something wrong?
I'm developing a TTS Audio Unit Extension that needs to write trace/log files to a shared App Group container. While the main app can successfully create and write files to the container, the extension gets sandbox denied errors despite having proper App Group entitlements configured.
Setup:
Main App (Flutter) and TTS Audio Unit Extension share the same App Group
App Group is properly configured in developer portal and entitlements
Main app successfully creates and uses files in the container
Container structure shows existing directories (config/, dictionary/) with populated files
Both targets have App Group capability enabled and entitlements set
Current behavior:
Extension can access/read the App Group container
Extension can see existing directories and files
All write attempts are blocked with "sandbox deny(1) file-write-create" errors
Code example:
const char* createSharedGroupPathWithComponent(const char* groupId, const char* component) {
NSString* groupIdStr = [NSString stringWithUTF8String:groupId];
NSString* componentStr = [NSString stringWithUTF8String:component];
NSURL* url = [[NSFileManager defaultManager]
containerURLForSecurityApplicationGroupIdentifier:groupIdStr];
NSURL* fullPath = [url URLByAppendingPathComponent:componentStr];
NSError *error = nil;
if (![[NSFileManager defaultManager] createDirectoryAtPath:fullPath.path
withIntermediateDirectories:YES
attributes:nil
error:&error]) {
NSLog(@"Unable to create directory %@", error.localizedDescription);
}
return [[fullPath path] UTF8String];
}
Error output:
Sandbox: simaromur-extension(996) deny(1) file-write-create /private/var/mobile/Containers/Shared/AppGroup/36CAFE9C-BD82-43DD-A962-2B4424E60043/trace
Key questions:
Are there additional entitlements required for TTS Audio Unit Extensions to write to App Group containers?
Is this a known limitation of TTS Audio Unit Extensions?
What is the recommended way to handle logging/tracing in TTS Audio Unit Extensions?
If writing to App Group containers is not supported, what alternatives are available?
Current entitlements:
<dict>
<key>com.apple.security.application-groups</key>
<array>
<string>group.com.<company>.<appname></string>
</array>
</dict>
Hi everyone 👋
I’m building an iOS app in Swift where I want to do the following:
Record the user’s voice
Transcribe the spoken sentence (speech-to-text)
Auto-detect the spoken language
Translate it to another language selected by the user (e.g., English → Spanish or Hindi → English)
Speak back (text-to-speech) the translated text on the same device
Is this possible to record via phone mic and play the transcribe voice into headphone's audio?
I'm working on adding CarPlay support to an audio app and am running into an issue. Occasionally, when a user opens the app from CarPlay while the main app scene is either not connected or is currently in the background, I will receive an error when attempting to activate the audio session. The code below mimics my setup:
do {
try AVAudioSession.sharedInstance().setCategory(.playback, mode: .spokenAudio)
try AVAudioSession.sharedInstance().setActive(true)
} catch {
print(error) // NSOSStatusErrorDomain - 560557684: Session activation failed
}
That error code maps to AVAudioSession.ErrorCode.cannotInterruptOthers.
Once in this state, all subsequent attempts to play different pieces of content will fail. However, things will start working normally if the user opens the app on their phone and tries again from CarPlay (while the app is in the foreground on their phone).
I'm not sure why it would behave this way and want to note that I do have the audio background mode capability enabled.
Has anyone else encountered this? Are there any workarounds or changes I could make to prevent this from happening?
I developed an educational app that implements audio-video communication through RTC, while using WebView to display course materials during classes. However, some users are experiencing an issue where the audio playback from WebView is very quiet. I've checked that the AVAudioSessionCategory is set by RTC to AVAudioSessionCategoryPlayAndRecord, and the AVAudioSessionCategoryOption also includes AVAudioSessionCategoryOptionMixWithOthers. What could be causing the WebView audio to be suppressed, and how can this be resolved?
Hi,
when a CoreMIDI driver controls physical HW it is probably quite commune to have to control the amount of MIDI data received from the system.
What comes to mind is to just delay returning control of the MIDIDriverInterface::Send() callback to the calling process. While the application trying to send MIDI really stalls until the callback returns it seems only to be a side effect of a generally stalled CoreMIDI server. Between the callbacks the application can send as much MIDI data as it wants to CoreMIDI, it's buffering seems to be endless... However the HW might not be able to play out all the data.
It seems there is no way to indicate an overflow/full buffer situation back the application/CoreMIDI. How is this supposed to work?
Thanks, any hints or pointers are highly appreciated!
Hagen.
My audio app shows a control bar at the bottom of the window. The controls show nicely, but there is a black "slab" appearing behind the inline controls, the same size as the playerView. Setting the player view background color does nothing:
playerView.wantsLayer = true playerView.layer?.backgroundColor = NSColor.clear.cgColor
How can I clear the background?
If I use .floating controlsStyle, I don't get the background "slab".
Topic:
Media Technologies
SubTopic:
Audio
Hi,
I'm still stuck getting a basic record-with-playthrouh pipeline to work.
Has anyone a sample of setting up a AVAudioEngine pipeline for recording with playthrough?
Plkaythrough works with AVPlayerNode as input but not with any microphone input. The docs mention the "enabled state" of the outputNode of the engine without explaining the concept, i.e. how to enable an output.
When the engine renders to and from an audio device, the AVAudioSession category and the availability of hardware determines whether an app performs output. Check the output node’s output format (specifically, the hardware format) for a nonzero sample rate and channel count to see if output is in an enabled state.
Well, in my setup the output is NOT enabled, and any attempt to switch (e.g. audioEngine.outputNode.auAudioUnit.setDeviceID(deviceID) )/ attach a dedicated device / ... results in exceptions / errors
We are currently working on a CarPlay navigation app and so far everything is working well except for speaking turn notifications.
Our TTS implementation works fine on the phone and works fine on CarPlay if the voice is spoken over the speaker in the car. If users connect a BT headset to the car and listen through that headset, then the voice commands are chopped up / stutter.
Why would users use BT headset? Well, we are working on a motorcycle app, and there are no speakers usually on a motorcycle.
It sounds like the BT channel is opened and closed repeatedly for every character / word spoken. This happens on different CarPlay devices and different Bluetooth headsets, we have reports from multiple users that they find this behavior annoying and that other apps work fine.
Is this a known issue? Are there possible workaround?
Hi. I am working on an audio app for iOS. I have added the CPNowPlayingPlaybackRateButton to my CPNowPlayingTemplate.
When the button is clicked, my handler changes the rate in the AVPlayer and updates the MPNowPlayingInfoCenter to the new rate, for example, 2.0.
Throughout, the Carplay button always displays "0x". I am wondering how to get this UI to accurately reflect the playback rate the user has selected, as always displaying 0x is a poor user experience.
You may suggest MPChangePlaybackRateCommand is relevant here, but I have not been able to get that to work either, and judging by posts online, not many other people have either. I have made a post about that here: https://developer.apple.com/forums/thread/773099
Is this a known Apple bug? Is there a way to get the UI to accurately reflect the playback rate of my audio?
Kind regards.
Topic:
Media Technologies
SubTopic:
Audio
I'm encountering errors while using AVAudioEngine with voice processing enabled (setVoiceProcessingEnabled(true)) in scenarios where the input and output audio devices are not the same. This issue arises specifically with mismatched devices, preventing the application from functioning as expected.
Works: Paired devices (e.g., MacBook Pro mic → MacBook Pro speakers) Fails: Mismatched devices (e.g., AirPods mic → MacBook Pro speakers)
When using paired input and output devices:
The setup works as expected. Example: MacBook Pro microphone → MacBook Pro speakers. When using mismatched devices:
AVAudioEngine setup fails during aggregate device construction. Example: AirPods microphone → MacBook Pro speakers. Error logs indicate a channel count mismatch.
Here are the partial logs. Due to the content limit, I cannot post the entire logs.
AUVPAggregate.cpp:1000 client-side input and output formats do not match (err=-10875)
AUVPAggregate.cpp:1036 err=-10875
AVAEInternal.h:109 [AVAudioEngineGraph.mm:1344:Initialize: (err = PerformCommand(*outputNode, kAUInitialize, NULL, 0)): error -10875
AggregateDevice.mm:329 Failed expectation of constructed aggregate (312): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (312): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
AggregateDevice.mm:182 error fetching default pair
AggregateDevice.mm:329 Failed expectation of constructed aggregate (336): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (336): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
AUHAL.cpp:1782 ca_verify_noerr: [AudioDeviceSetProperty(mDeviceID, NULL, 0, isInput, kAudioDevicePropertyIOProcStreamUsage, theSize, theStreamUsage), 560227702]
AudioHardware-mac-imp.cpp:3484 AudioDeviceSetProperty: no device with given ID
AUHAL.cpp:1782 ca_verify_noerr: [AudioDeviceSetProperty(mDeviceID, NULL, 0, isInput, kAudioDevicePropertyIOProcStreamUsage, theSize, theStreamUsage), 560227702]
AggregateDevice.mm:182 error fetching default pair
AggregateDevice.mm:329 Failed expectation of constructed aggregate (348): mInput.streamChannelCounts == inputStreamChannelCounts
AggregateDevice.mm:331 Failed expectation of constructed aggregate (348): mInput.totalChannelCount == std::accumulate(inputStreamChannelCounts.begin(), inputStreamChannelCounts.end(), 0U)
Is it possible to use voice processing with different input/output devices?
If yes, are there any specific configurations required to handle mismatched devices? How can we resolve channel count mismatch errors during aggregate device construction?
Are there settings or API adjustments to enforce compatibility between input/output devices? Are there any workarounds or alternative approaches to achieve voice processing functionality with mismatched devices?
For instance, can we force an intermediate channel configuration or downmix input/output formats?
Hello,
i can successfully match music using shazamkit on Apple using SwiftUI, a simple app that let user to load an audio file and exctracts the relative match, while i am unable to match music using shamzamkit on Android. I am trying to make the same simple app but i cannot match music as i get MATCH_ATTEMPT_FAILED every time i try to. I don't know what i am doing wrong but the shazam part in the kotlin Android code is in this method :
suspend fun processAudioFileInBackground(
filePath: String,
developerTokenProvider: DeveloperTokenProvider
) = withContext(Dispatchers.IO) {
val bufferSize = 1024 * 1024
val audioFile = FileInputStream(filePath)
val byteBuffer = ByteBuffer.allocate(bufferSize)
byteBuffer.order(ByteOrder.LITTLE_ENDIAN)
var bytesRead: Int
while (audioFile.read(byteBuffer.array()).also { bytesRead = it } != -1) {
val signatureGenerator = (ShazamKit.createSignatureGenerator(AudioSampleRateInHz.SAMPLE_RATE_44100) as ShazamKitResult.Success).data
signatureGenerator.append(byteBuffer.array(), bytesRead, System.currentTimeMillis())
val signature = signatureGenerator.generateSignature()
println("Signature: ${signature.durationInMs}")
val catalog = ShazamKit.createShazamCatalog(developerTokenProvider, Locale.ENGLISH)
val session = (ShazamKit.createSession(catalog) as ShazamKitResult.Success).data
val matchResult = session.match(signature)
println("MatchResult : $matchResult")
setMatchResult(matchResult)
byteBuffer.clear()
}
audioFile.close()
}
I noticed that changing Locale in catalog creation results in different result as i get NoMatch without exception. Can you please help me with this? Do i need to create a custom catalog?
I am work an app development on an app which request an audio function in background as an alert sound.
during debug testing , the function work fine,
but once I testing standalone without debugging , The function not work , it will play out the sound when I back to app.
does any way to trace the issues ?
Since iOS 18, the system setting “Allow Audio Playback” (enabled by default) allows third-party app audio to continue playing while the user is recording video with the Camera app. This has created a problem for the app I’m developing.
➡️ The problem:
My app plays continuous audio in both foreground and background states. If the user starts recording video using the iOS Camera app, the app’s audio — still playing in the background — gets captured in the video — obviously an unintended behavior.
Yes, the user could stop the app manually before starting the video recording, but that can’t be guaranteed. As a developer, I need a way to stop the app’s audio before the video recording begins.
So far, I haven’t found a reliable way to detect when video recording starts if ‘Allow Audio Playback’ is ON.
➡️ What I’ve tried:
— AVAudioSession.interruptionNotification → doesn’t fire
— devicesChangedEventStream → not triggered
I don’t want to request mic permission (app doesn’t use mic). also, disabling the app from playing audio in the background isn’t an option as it is a crucial part of the user experience
➡️ What I need:
A reliable, supported way to detect when the Camera app begins video recording, without requiring mic access — so I can stop audio and avoid unintentional overlap with the user’s recordings.
Any official guidance, workarounds, or AVFoundation techniques would be greatly appreciated.
Thanks.
Hello,
I've discovered a buffer initialization bug in AVAudioUnitSampler that happens when loading presets with multiple zones referencing different regions in the same audio file (monolith/concatenated samples approach).
Almost all zones output silence (i.e. zeros) at the beginning of playback instead of starting with actual audio data.
The Problem
Setup:
Single audio file (monolith) containing multiple concatenated samples
Multiple zones in an .aupreset, each with different sample start and sample end values pointing to different regions of the same file
All zones load successfully without errors
Expected Behavior:
All zones should play their respective audio regions immediately from the first sample.
Actual Behavior:
Last zone in the zone list: Works perfectly - plays audio immediately
All other zones: Output [0, 0, 0, 0, ..., _audio_data] instead of [real_audio_data]
The number of zeros varies from event to event for each zone. It can be a couple of samples (<30) up to several buffers.
After the initial zeros, the correct audio plays normally, so there is no shift in audio playback, just missing samples at the beginning.
Minimal Reproduction
1. Create Test Monolith Audio File
Create a single Wav file with 3 concatenated 1-second samples (44.1kHz):
Sample 1: frames 0-44099 (constant amplitude 0.3)
Sample 2: frames 44100-88199 (constant amplitude 0.6)
Sample 3: frames 88200-132299 (constant amplitude 0.9)
2. Create Test Preset
Create an .aupreset with 3 zones all referencing the same file:
Pseudo code
<Zone array>
<zone 1> start : 0, end: 44099, note: 60, waveform: ref_to_monolith.wav;
<zone 2> start sample: 44100, note: 62, end sample: 88199, waveform: ref_to_monolith.wav;
<zone 3> start sample: 88200, note: 64, end sample: 132299, waveform: ref_to_monolith.wav;
</Zone array>
3. Load and Test
// Load preset into AVAudioUnitSampler
let sampler = AVAudioUnitSampler()
try sampler.loadAudioFiles(from: presetURL)
// Play each zone (MIDI notes C4=60, D4=62, E4=64)
sampler.startNote(60, withVelocity: 64, onChannel: 0) // Zone 1
sampler.startNote(62, withVelocity: 64, onChannel: 0) // Zone 2
sampler.startNote(64, withVelocity: 64, onChannel: 0) // Zone 3
4. Observed Result
Zone 1 (C4): [0, 0, 0, ..., 0.3, 0.3, 0.3] ❌ Zeros at beginning
Zone 2 (D4): [0, 0, 0, ..., 0.6, 0.6, 0.6] ❌ Zeros at beginning
Zone 3 (E4): [0.9, 0.9, 0.9, ...] ✅ Works correctly (last zone)
What I've Extensively Tested
What DOES Work
Separate files per zone:
Each zone references its own individual audio file
All zones play correctly without zeros
Problem: Not viable for iOS apps with 500+ sample libraries due to file handle limitations
What DOESN'T Work (All Tested)
1. Different Audio Formats:
CAF (Float32 PCM, Int16 PCM, both interleaved and non-interleaved)
M4A (AAC compressed)
WAV (uncompressed)
SF2 (SoundFont2)
Bug persists across all formats
2. CAF Region Chunks:
Created CAF files with embedded region chunks defining zone boundaries
Set zones with no sampleStart/sampleEnd in preset (nil values)
AVAudioUnitSampler completely ignores CAF region metadata
Bug persists
3. Unique Waveform IDs:
Gave each zone a unique waveform ID (268435456, 268435457, 268435458)
Each ID has its own file reference entry (all pointing to same physical file)
Hypothesized this might trigger separate buffer initialization
Bug persists - no improvement
4. Different Sample Rates:
Tested: 44.1kHz, 48kHz, 96kHz
Bug occurs at all sample rates
5. Mono vs Stereo:
Bug occurs with both mono and stereo files
Environment
macOS: Sonoma 14.x (tested across multiple minor versions)
iOS: Tested on iOS 17.x with same results
Xcode: 16.x
Frameworks: AVFoundation, AudioToolbox
Reproducibility: 100% reproducible with setup described above
Impact & Use Case
This bug severely impacts professional music applications that need:
Small file sizes: Monolith files allow sharing compressed audio data (AAC/M4A)
iOS file handle limits: Opening 400+ individual sample files is not viable on iOS
Performance: Single file loading is much faster than hundreds of individual files
Standard industry practice: Monolith/concatenated samples are used by EXS24, Kontakt, and most professional samplers
Current Impact:
Cannot use monolith files with AVAudioUnitSampler on iOS
Forced to choose between: unusable audio (zeros at start) OR hitting iOS file limits
No viable workaround exists
Root Cause Hypothesis
The bug appears to be in AVAudioUnitSampler's internal buffer initialization when:
Multiple zones share the same source audio file
Each zone specifies different sampleStart/sampleEnd offsets
Key observation: The last zone in the zone array always works correctly.
This is NOT related to:
File permissions or security-scoped resources (separate files work fine)
Audio codec issues (happens with uncompressed PCM too)
Preset parsing (preset loads correctly, all zones are valid)
Questions
Is this a known issue? I couldn't find any documentation, bug reports, or discussions about this.
Is there ANY workaround that allows monolith files to work with AVAudioUnitSampler?
Alternative APIs? Is there a different API or approach for iOS that properly supports monolith sample files?