Dive into the technical aspects of audio on your device, including codecs, format support, and customization options.

Audio Documentation

Posts under Audio subtopic

Post

Replies

Boosts

Views

Activity

On iOS 18, Mandarin is read aloud as Cantonese
Please include the line below in follow-up emails for this request. Case-ID: 11089799 When using AVSpeechUtterance and setting it to play in Mandarin, if Siri is set to Cantonese on iOS 18, it will be played in Cantonese. There is no such issue on iOS 17 and 16. 1.let utterance = AVSpeechUtterance(string: textView.text) let voice = AVSpeechSynthesisVoice(language: "zh-CN") utterance.voice = voice 2.In the phone settings, Siri is set to Cantonese
3
1
568
Feb ’25
MPNowPlayingInfoCenter nowPlayingInfo throttled
Hello, I have been running into issues with setting nowPlayingInfo information, specifically updating information for CarPlay and the CPNowPlayingTemplate. When I start playback for an item, I see lock screen information update as expected, along with the CarPlay now playing information. However, the playing items are books with collections of tracks. When I select a new track(chapter) within the book, I set the MPMediaItemPropertyTitle to the new chapter name. This change is reflected correctly on the lock screen, but almost never appears correctly on the CarPlay CPNowPlayingTemplate. The previous chapter title remains set and never updates. I see "Application exceeded audio metadata throttle limit." in the debug console fairly frequently. From that a I figured that I need to minimize updates to the nowPlayingInfo dictionary. What I did: I store the metadata dictionary in a local dictionary and only set values in the main nowPlayingInfo dictionary when they are different from the current value. I kick off the nowPlayingInfo update via a task that initially sleeps for around 2 seconds (not a final value, just for my current testing). If a previous Task is active, it gets cancelled, so that only one update can happen within that time window. Neither of these things have been sufficient. I can switch between different titles entirely and the information updates (including cover art). But when I switch chapters within a title, the MPMediaItemPropertyTitle continues to get dropped. I know the value is getting set, because it updates on the lock screen correctly. In total, I have 12 keys I update for info, though with the above changes, usually 2-4 of them actually get updated with high frequency. I am running out of ideas to satisfy the throttling thresholds to accurately display metadata. I could use some advice. Thanks.
4
1
180
May ’25
How can third-party iOS apps obtain real-time waveform / spectrogram data for Apple Music tracks (similar to djay & other DJ apps)?
Hi everyone, I’m working on an iOS MusicKit app that overlays a metronome on top of Apple Music playback. To line the clicks up perfectly I’d like access to low-level audio analysis data—ideally a waveform / spectrogram or beat grid—while the track is playing. I’ve noticed that several approved DJ apps (e.g. djay, Serato, rekordbox) can already: • Display detailed scrolling waveforms of Apple Music songs • Scratch, loop or time-stretch those tracks in real time That implies they receive decoded PCM frames or at least high-resolution analysis data from Apple Music under a special entitlement. My questions: 1. Does MusicKit (or any public framework) expose real-time audio buffers, FFT bins, or beat markers for streaming Apple Music content? 2. If not, is there an Apple program or entitlement that developers can apply for—similar to the “DJ with Apple Music” initiative—to gain that deeper access? 3. Where can I find official documentation or a point of contact for this kind of request? I’ve searched the docs and forums but only see standard MusicKit playback APIs, which don’t appear to expose raw audio for DRM-protected songs. Any guidance, links or insider tips on the proper application process would be hugely appreciated! Thanks in advance.
1
1
637
Dec ’25
Improving Speech Analyzer Transcription for technical terms
I am developing an app with transcription and I am exploring ways to improve the transcription from the SpeechAnalyzer/Transcriber for technical terms. SFSpeech... recognition had the capability of being augmented by contextualStrings. Does something similar exist for SpeechAnalyzer/Transcriber? If so please point me towards the documentation and any sample code that may exist for this. If there are other options, please let me know.
1
1
269
Sep ’25
Indicate Packet Loss With AVAudioConverter for OPUS Decoding
I'm using an AVAudioConverter object to decode an OPUS stream for VoIP. The decoding itself works well, however, whenever the stream stalls (no more audio packet is available to decode because of network instability) this can be heard in crackling / abrupt stop in decoded audio. OPUS can mitigate this by indicating packet loss by passing a null pointer in the C-library to int opus_decode_float (OpusDecoder * st, const unsigned char * data, opus_int32 len, float * pcm, int frame_size, int decode_fec), see https://opus-codec.org/docs/opus_api-1.2/group__opus__decoder.html#ga9c554b8c0214e24733a299fe53bb3bd2. However, with AVAudioConverter using Swift I'm constructing an AVAudioCompressedBuffer like so:         let compressedBuffer = AVAudioCompressedBuffer(             format: VoiceEncoder.Constants.networkFormat,             packetCapacity: 1,             maximumPacketSize: data.count         )         compressedBuffer.byteLength = UInt32(data.count)         compressedBuffer.packetCount = 1   compressedBuffer.packetDescriptions! .pointee.mDataByteSize = UInt32(data.count)         data.copyBytes(             to: compressedBuffer.data .assumingMemoryBound(to: UInt8.self),             count: data.count         ) where data: Data contains the raw OPUS frame to be decoded. How can I specify data loss in this context and cause the AVAudioConverter to output PCM data whenever no more input data is available? More context: I'm specifying the audio format like this:         static let frameSize: UInt32 = 960         static let sampleRate: Float64 = 48000.0         static var networkFormatStreamDescription = AudioStreamBasicDescription(             mSampleRate: sampleRate,             mFormatID: kAudioFormatOpus,             mFormatFlags: 0,             mBytesPerPacket: 0,             mFramesPerPacket: frameSize,             mBytesPerFrame: 0,             mChannelsPerFrame: 1,             mBitsPerChannel: 0,             mReserved: 0         )         static let networkFormat = AVAudioFormat( streamDescription: &networkFormatStreamDescription )! I've tried 1) setting byteLength and packetCount to zero and 2) returning nil but setting .haveData in the AVAudioConverterInputBlock I'm using with no success.
1
1
899
May ’25
Start and stop recording Voice Memos with Siri
using iOS 26.2; Airpods 4 Long press stem to launch Siri Speak "Record Voice Memo" -> Recording starts Recording in progress... Long press stem to launch Siri -> Nothing happens. To stop recording need use phone. is this intended behaviour? i would like to be able to stop recording with Siri I am able to launch Siri from phone while recording, but point is to keep phone in pocket and start/stop recordings only via Airpods.
1
0
164
Dec ’25
AVSpeechSynthesizer system voices (SLA clarification)
Hello, I am building an iOS-only, commercial app that uses AVSpeechSynthesizer with system voices, strictly using the APIs provided by Apple. Before distributing the app, I want to ensure that my current implementation does not conflict with the iOS Software License Agreement (SLA) and is aligned with Apple’s intended usage. For a better playback experience (more accurate estimation of utterance duration and smoother skip forward/backward during playback), I currently synthesize speech using: AVSpeechSynthesizer.write(_:toBufferCallback:) Converting the received AVAudioPCMBuffer buffers into audio data Storing the audio inside the app sandbox Playing it back using AVAudioPlayer / AVAudioEngine The cached audio is: Generated fully on-device using system voices Stored only inside the app’s private container Used only for internal playback controls (timeline, seek, skip ±5 seconds) Never shared, exported, uploaded, or exposed outside the app The alternative approaches would be: Keeping the generated audio entirely in memory (RAM) for playback purposes, without writing it to the file system at any point Or using AVSpeechSynthesizer.speak(_:) and playing speech strictly in real time which has a poorer user experience compared to my approach I have reviewed the current iOS Software License Agreement: https://www.apple.com/legal/sla/docs/iOS18_iPadOS18.pdf In particular, section (f) mentions restrictions around System Characters, Live Captions, and Personal Voice, including the following excerpt: “…use … only for your personal, non-commercial use… No other creation or use of the System Characters, Live Captions, or Personal Voice is permitted by this License, including but not limited to the use, reproduction, display, performance, recording, publishing or redistribution in a … commercial context.” I do not see a specific reference in the SLA to system text-to-speech voices used via AVSpeechSynthesizer, and I want to be certain that temporarily caching synthesized speech for internal, non-exported playback is acceptable in a commercial app. My question is: Is caching AVSpeechSynthesizer system-voice output inside the app sandbox for internal playback acceptable, or is Apple’s recommended approach to rely only on real-time playback (speak(_:)) or strictly in-memory buffering without file storage? If this question falls outside DTS technical scope and is instead a policy or licensing matter, I would appreciate guidance on the authoritative Apple documentation or the correct Apple team/contact. Thank you.
0
1
289
3w
Process to request the restricted entitlement behind “DJ with Apple Music” (tempo control / time-stretch on Apple Music streams)?
Hi, I’m an iOS developer building an app with an use case that needs advanced playback on Apple Music subscription streams, specifically: • Real-time tempo change (BPM) during playback — i.e., time-stretch with key-lock, not just crossfade. • Beat-matched transitions between tracks. From what I can tell, this capability seems to exist only for approved partners and isn’t available through public MusicKit. Question: What’s the official request path to be evaluated for that restricted partner entitlement (application form, questionnaire, NDA, or internal team/BD contact)? If the entitlement identifier is internal, how can I get my account routed to the right Apple Music team? For reference, publicly announced partners include Algoriddim djay, Serato DJ Pro, rekordbox (AlphaTheta), and Engine DJ—all of which appear to implement mixing features that imply advanced playback (tempo/beat-matching) on Apple Music content. I’d prefer not to share product details publicly for the moment and can provide specifics privately if needed. Thanks in advance!
0
1
280
Oct ’25
AVAudioRecorder loses audio recorded before interruption
Hi everyone, I'm running into an issue with AVAudioRecorder when handling interruptions such as phone calls or alarms. Problem: When the app is recording audio and an interruption occurs: I handle the interruption with audioRecorder?.pause() inside AVAudioSession.interruptionNotification (on .began). On .ended, I check for .shouldResume and call audioRecorder?.record() again. The recorder resumes successfully, but only the audio recorded after the interruption is saved. The audio recorded before the interruption is lost, even though I'm using the same file URL and not recreating the recorder. Repro: Start a recording with AVAudioRecorder Simulate a system interruption (e.g., incoming call) Resume recording after the interruption Stop and inspect the output audio file Expected: Full audio (before and after interruption) should be saved. Actual: Only the audio after interruption is saved; the earlier part is missing Notes: According to the documentation, calling .record() after .pause() should resume recording into the same file. I confirmed that the file URL does not change, and I do not recreate the recorder instance. No error is thrown by the system during this process. This behavior happens consistently when the app is interrupted and resumed. Question: Is this a known issue? Is there a recommended workaround for preserving the full recording when interruptions happen? Thanks in advance!
1
1
303
Dec ’25
CMFormatDescription.audioStreamBasicDescription has wrong or unexpected sample rate for audio channels with different sample rates
In my app I use AVAssetReaderTrackOutput to extract PCM audio from a user-provided video or audio file and display it as a waveform. Recently a user reported that the waveform is not in sync with his video, and after receiving the video I noticed that the waveform is in fact double as long as the video duration, i.e. it shows the audio in slow-motion, so to speak. Until now I was using CMFormatDescription.audioStreamBasicDescription.mSampleRate which for this particular user video returns 22'050. But in this case it seems that this value is wrong... because the audio file has two audio channels with different sample rates, as returned by CMFormatDescription.audioFormatList.map({ $0.mASBD.mSampleRate }) The first channel has a sample rate of 44'100, the second one 22'050. If I use the first sample rate, the waveform is perfectly in sync with the video. The problem is given by the fact that the ratio between the audio data length and the sample rate multiplied by the audio duration is 8, double the ratio for the first audio file (4). In the code below this ratio is given by Double(length) / (sampleRate * asset.duration.seconds) When commenting out the line with the sampleRate variable definition in the code below and uncommenting the following line, the ratios for both audio files are 4, which is the expected result. I would expect audioStreamBasicDescription to return the correct sample rate, i.e. the one used by AVAssetReaderTrackOutput, which (I think) somehow merges the stereo tracks. The documentation is sparse, and in particular it’s not documented whether the lower or higher sample rate is used; in this case, it seems like the higher one is used, but audioStreamBasicDescription for some reason returns the lower one. Does anybody know why this is the case or how I should extract the sample rate of the produced PCM audio data? Should I always take the higher one? I created FB19620455. let openPanel = NSOpenPanel() openPanel.allowedContentTypes = [.audiovisualContent] openPanel.runModal() let url = openPanel.urls[0] let asset = AVURLAsset(url: url) let assetTrack = asset.tracks(withMediaType: .audio)[0] let assetReader = try! AVAssetReader(asset: asset) let readerOutput = AVAssetReaderTrackOutput(track: assetTrack, outputSettings: [AVFormatIDKey: Int(kAudioFormatLinearPCM), AVLinearPCMBitDepthKey: 16, AVLinearPCMIsBigEndianKey: false, AVLinearPCMIsFloatKey: false, AVLinearPCMIsNonInterleaved: false]) readerOutput.alwaysCopiesSampleData = false assetReader.add(readerOutput) let formatDescriptions = assetTrack.formatDescriptions as! [CMFormatDescription] let sampleRate = formatDescriptions[0].audioStreamBasicDescription!.mSampleRate //let sampleRate = formatDescriptions[0].audioFormatList.map({ $0.mASBD.mSampleRate }).max()! print(formatDescriptions[0].audioStreamBasicDescription!.mSampleRate) print(formatDescriptions[0].audioFormatList.map({ $0.mASBD.mSampleRate })) if !assetReader.startReading() { preconditionFailure() } var length = 0 while assetReader.status == .reading { guard let sampleBuffer = readerOutput.copyNextSampleBuffer(), let blockBuffer = sampleBuffer.dataBuffer else { break } length += blockBuffer.dataLength } print(Double(length) / (sampleRate * asset.duration.seconds))
0
1
125
Aug ’25
[AVPlayerItemVideoOutput initWithPixelBufferAttributes:] output attributes setting not work
My app want Converting iphone12 HDR Video to SDR,to edit。 follow the doc Apple-HDR-Convert. My code setting the pixBuffAttributes        [pixBuffAttributes setObject:(id)(kCVImageBufferYCbCrMatrix_ITU_R_709_2) forKey:(id)kCVImageBufferYCbCrMatrixKey];       [pixBuffAttributes setObject:(id)(kCVImageBufferColorPrimaries_ITU_R_709_2) forKey:(id)kCVImageBufferColorPrimariesKey];       [pixBuffAttributes setObject:(id)kCVImageBufferTransferFunction_ITU_R_709_2 forKey:(id)kCVImageBufferTransferFunctionKey];       playerItemOutput = [[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:pixBuffAttributes]; but I get the playerItemOutput's output buffer   CFTypeRef colorAttachments = CVBufferGetAttachment(pixelBuffer, kCVImageBufferYCbCrMatrixKey, NULL);     CFTypeRef colorPrimaries = CVBufferGetAttachment(pixelBuffer, kCVImageBufferColorPrimariesKey, NULL);     CFTypeRef colorTransFunc = CVBufferGetAttachment(pixelBuffer, kCVImageBufferTransferFunctionKey, NULL);      NSLog(@"colorAttachments = %@", colorAttachments);     NSLog(@"colorPrimaries = %@", colorPrimaries);     NSLog(@"colorTransFunc = %@", colorTransFunc); log output: colorAttachments = ITU_R_2020 colorPrimaries = ITU_R_2020 colorTransFunc = ITU_R_2100_HLG pixBuffAttributes setting output format invalid,please help!
1
1
829
Nov ’25
CoreAudio HAL plugin vs dext
The presentation "create audio drivers with DriverKit" from WWDC 2021 demonstrates how to use a dext to implement a virtual audio driver. It also says " If a virtual audio driver or device is all that is needed, the audio server plug-in driver model should continue to be used". Indeed, in AudioDriverKit/AudioDriverKitTypes.h, there is no IOUserAudioTransportType Virtual, although CoreAudio/AudioHardwareBase.h includes kAudioDeviceTransportTypeVirtual. For one of our products, we require virtual devices to implement a software loopback "cable". We've implemented this using the "traditional" HAL plugin, and as a proof-of-concept, also using a dext. In the dext, I tried setting the transport type to 'virt', which seems to only have the effect of changing the icon shown in Audio Midi Setup. HAL plugins require an installer, and the installer has to kill coreaudiod in a post-install script. You have to turn off SIP to debug them. Just like AudioDriverKit drivers, they are out-of-process and run in a process not owned by the hosting app. Our HAL plugin's interface is property based; we had to write a lot of boiler-plate code to implement required properties. Writing an AudioDriverKit driver is in most respects easier - a lot of the scaffolding is implemented in the base driver, which we only alter where required. Debugging and installation is much easier. The dext works just fine, as far as we can ascertain, just as well as a HAL plugin. So, my question is - is the advice to use a HAL plugin for a virtual device still correct in 2025? And if so, what's the objection? We'd really prefer to ship the AudioDriverKit virtual audio device.
2
1
516
Mar ’25
Shazamkit with AirPods
HI Guys, I'm using Shazamkit in my IOS app and successfully capturing the currently playing track details, when using the devices (iPhone) built-in mic. When I test with AirPods though, my app cannot both send the output to through the AirPods and capture that same output with the AirPods mic, for Shazamkit recognition. I believe this must be possible, because the Shazamkit widget on IOS can do this. Is it restricted in some way for third party apps? If not, I'd appreciate some guidance on how to achieve this in Swift code. Thanks in advance.
1
1
598
Feb ’25
iOS Speech Error on Mobile Simulator (Error fetching voices)
I'm writing a simple app for iOS and I'd like to be able to do some text to speech in it. I have a basic audio manager class with a "speak" function: import Foundation import AVFoundation class AudioManager { static let shared = AudioManager() var audioPlayer: AVAudioPlayer? var isPlaying: Bool { return audioPlayer?.isPlaying ?? false } var playbackPosition: TimeInterval = 0 func playSound(named name: String) { guard let url = Bundle.main.url(forResource: name, withExtension: "mp3") else { print("Sound file not found") return } do { if audioPlayer == nil || !isPlaying { audioPlayer = try AVAudioPlayer(contentsOf: url) audioPlayer?.currentTime = playbackPosition audioPlayer?.prepareToPlay() audioPlayer?.play() } else { print("Sound is already playing") } } catch { print("Error playing sound: \(error.localizedDescription)") } } func stopSound() { if let player = audioPlayer { playbackPosition = player.currentTime player.stop() } } func speak(text: String) { let synthesizer = AVSpeechSynthesizer() let utterance = AVSpeechUtterance(string: text) utterance.voice = AVSpeechSynthesisVoice(language: "en-GB") synthesizer.speak(utterance) } } And my app shows text in a ScrollView: ScrollView { Text(self.description) .padding() .foregroundColor(.black) .font(.headline) .background(Color.gray.opacity(0)) }.onAppear { AudioManager.shared.speak(text: self.description) } However, the text doesn't get read out (in the simulator). I see some output in the console: Error fetching voices: Swift.DecodingError.dataCorrupted(Swift.DecodingError.Context(codingPath: [], debugDescription: "Invalid container metadata for _UnkeyedDecodingContainer, found keyedGraphEncodingNodeID", underlyingError: nil)). Using fallback voices. I'm probably doing something wrong here, but not sure what.
5
1
669
Dec ’25
Lock screen media controls for MusicKit/ ApplicationMusicPlayer
Hi, when using ApplicationMusicPlayer from MusicKit my app automatically gets the media controls on the lock screen: Play/ Pause, Skip Buttons, Playback Position etc. I would like to customize these. Tried a bunch of things, e.g. using MPRemoteCommandCenter. So far I haven't had any success. Does anyone know how I can customize the media controls of ApplicationMusicPlayer. Thank you.
2
0
512
Sep ’25
AVAudioPlayer/SKAudioNode audio no longer plays after interruption
Hi 👋! We have a SpriteKit-based app where we play AVAudio sounds in three different ways: Effects (incl. UI sounds) with AVAudioPlayer. Long looping tracks with AVAudioPlayer. Short animation effects on the timeline of SpriteKit's SKScene files (effectively SKAudioNode nodes). We've found that when you exit the app or otherwise interrupt audio plays, future audio plays often fail. For example, there's a WebKit-based video trailer inside the app, and if you play it, our looping background music track (2.) will stop playing, and won't resume as you close the trailer (return from WebKit). This is probably due to us not manually restarting the track (so may well be easily fixed). Periodically played AVAudioPlayer audio (1.) are not affected. However, the more concerning thing is that the audio tracks on SKScene file timelines (3.) will no longer play. My hypothesis is that AVAudioEngine gets interrupted, and needs to be restarted for those AVAudioNode elements to regain functionality. Thing is, we don't deal with AVAudioEngine at all currently in the app, meaning it is never initiated to begin with. Obviously things return to normal when you remove the app from short-term memory and restart it. However, it seems many of our users aren't doing this, and often report audio failing presumably due to some interruption in the past without the app ever being cleared from memory. Any idea why timeline-run SKAudioNodes would fail like this? Should the app react to app backgrounding/foregrounding regarding audio? Any help would be very much appreciated ✌️!
0
1
133
May ’25
Is there a way to get lossless music playback on macOS?
I noticed that while playing back the same tracks via MusicKit on different OSes I get different results regarding the audio files being streamed. Playing back a lossless file with 24Bit 48kHz and watching the Console for RemotePlayerService I get: on iPadOS: Lossless; groupID: audio-alac-stereo-48000-24; bitDepth: 24-bit; sampleRate: 48khz; codec: alac; channels: 2; layout: Stereo; on macOS: Creating AudioQueue with format:'paac', framesPerPacket:1024, sampleRate:44100 While the iPad looks perfect, the Mac does not. Is there a way to fix this issue on macOS. BTW: I switched the Audio-Midi Settings before, after and while the macOS App was lunched. I also switched to different output devices. I wasn't able to change the bad audio-output on the mac. I tested this under Sequoia 15.5 and Tahoe beta 1, Xcode 16.4 and 26 beta 1. The AudioVariants of the Album/Tracks are .dolbyAtmos, .lossless, .lossyStereo Apple Music displays Lossless 24 Bit/48 kHz ALAC when clicking on the playercontroll icon on macOS I hope there are only some missing or misconfigured properties to get macOS up to par. Thanks :-)
0
1
139
Jun ’25
iOS 17 camera capture assertions and issues
Hello, Starting in iOS 17, our application started having some issue publishing to our video session. More specifically the video capture seems to be broken in some, but not all sessions. What's troubling is that we're seeing that it fails consistently every 4 sessions. It also fails silently, without reporting any problems to the app. We only notice that there are no frames being rendered or sent to the remote devices. Here's what shows-up in the console: <<<< FigCaptureSourceRemote >>>> Fig assert: "! storage->connectionDied" at bail (FigCaptureSourceRemote.m:235) - (err=0) <<<< FigCaptureSourceRemote >>>> Fig assert: "err == 0 " at bail (FigCaptureSourceRemote.m:253) - (err=-16453) Anyone seeing this? Any idea what could be the cause? Our sessions work perfectly on iOS16 and below. Thanks
3
1
1.4k
Oct ’25
Always audio from latest connected external USB mic
Hello! I've two mics connected to a USB-hub. The USB-hub is then connected to my iPad. Both mics are part of the audio session's list of available inputs. The problem is that regardless of which mic I select in my app (using setPreferredInput() on the audio session), the audio keeps coming from the mic that was last connected to the USB-hub. Anyone that knows if this is a limitation in iPadOS/iOS?
1
1
205
Jul ’25
No audio in screen recordings when using AVAudioEngine Voice Processing
Hello, We are developing a real-time speech recognition application and are utilizing AVAudioEngine with voice processing enabled on the input node. However, we have observed that enabling this mode interferes with the built-in iOS screen recording feature - specifically, the recorded video does not capture any audio when this mode is active. Since we want users to be able to record their experience within our app, this issue significantly impacts our functionality. Is there a known workaround or recommended approach to ensure that both voice processing and screen recording can function simultaneously? Any guidance would be greatly appreciated. Thank you!
2
1
363
Oct ’25