Is it possible to play WebM audio on iOS? Either with AVPlayer, AVAudioEngine, or some other API?
Safari has supported this for a few releases now, and I'm wondering if I missed something about how to do this. By default these APIs don't seem to work (nor does ExtAudioFileOpen).
Our usecase is making it possible for iOS users to play back audio recorded in our webapp (desktop versions of Chrome & Firefox only support webm as a destination format for MediaRecorder)
Audio
RSS for tagDive into the technical aspects of audio on your device, including codecs, format support, and customization options.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I’m facing a strange audio routing issue that seems specific to iPhone 14 Pro / Pro Max.
I’m using LiveKit (WebRTC) in a React Native app, which uses AVAudioSession internally for audio capture (VoIP / call-style usage).
🔍 What’s happening:
I’m using an external USB microphone.
On these devices:
iPhone 11 → ✅ USB mic works
iPhone 13 → ✅ USB mic works
iPhone 17 Pro → ✅ USB mic works
iPhone 14 Pro Max → ❌ USB mic does NOT work
On iPhone 14 Pro Max:
The same USB mic:
✅ Works in Voice Memos
✅ Works in Instagram Live
❌ Does NOT appear as an input option in my app
❌ Does NOT work in WhatsApp / Instagram calls
Also:
In my app on iPhone 14 Pro Max, iOS does not show the audio input selector UI
On iPhone 17 Pro, the same app and same build does show the selector and the USB mic works
⚙️ My audio session config ( LiveKit ):
await AudioSession.setAppleAudioConfiguration({
audioCategory: 'playAndRecord',
audioMode: 'default',
audioCategoryOptions: ['allowBluetooth', 'defaultToSpeaker'],
});
await AudioSession.startAudioSession();
❓ My questions:
Is this a known limitation or behavior specific to iPhone 14 Pro / Pro Max?
Does iPhone 14 Pro have different audio routing rules for call / VoIP mode compared to other devices?
Why does the same USB mic work in recording apps (Voice Memos, Instagram Live) but not in call-style apps (LiveKit, WhatsApp, Instagram call)?
Is there any documented difference in AVAudioSession behavior on iPhone 14 Pro regarding external USB audio inputs?
The device is connected to Bluetooth A and Bluetooth B, currently the audio is played through Bluetooth A, click the interface button, how to realize the code to switch to Bluetooth B?
Hello,
Has anyone else experienced variations in the accuracy of the playbackTime value? After a few seconds of playback, the reported time adjusts by a fraction of a second, making it difficult to calculate the actual playbackTime of the audio.
This can be recreated by playing a song in MusicKit, recording the start time of the audio, playing for at least 10-20 seconds, and then comparing the playbackTime value to one calculated using the start time of the audio. In my experience this jump occurs after about 10 seconds of playback.
Any help would be appreciated.
Thanks!
AVAudioFormat has no Swift concurrency annotations but the documentation states "Instances of this class are immutable."
This made me always assume it was safe to pass AVAudioFormat instances around. Is this the case? If so can it be marked as Sendable? Am I missing something?
I am trying to use AVAudioEngine for recording and playing for a voice chat kind of app, but when the speaker plays any audio while recording, the recording take the speaker audio as input. I want to filter that out. Are there any suggestions for the swift code
I am developing an app with transcription and I am exploring ways to improve the transcription from the SpeechAnalyzer/Transcriber for technical terms. SFSpeech... recognition had the capability of being augmented by contextualStrings. Does something similar exist for SpeechAnalyzer/Transcriber? If so please point me towards the documentation and any sample code that may exist for this. If there are other options, please let me know.
I have an app that records a health provider’s conversation with a patient. I am using Audio Queue Services for this. If a phone call comes in while recording, the doctor wants to be able to ignore the call and continue the conversation without touching the phone. If the doctor answers the call, that’s fine – I will stop the recording. I can detect when the call comes in and ends using CXCallObserver and AVAudioSession.interruptionNotification. Unfortunately, when a call comes in and before it is answered or dismissed, the audio is suppressed. After the call is dismissed, the audio continues to be suppressed. How can I continue to get audio from the mic as long as the user does not answer the phone call?
Topic:
Media Technologies
SubTopic:
Audio
I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different audio files.
Nothing in the documentation says that the speechRecognitionMetadata property on an SFSpeechRecognitionResult will be nil until isFinal, but that's the behaviour I'm seeing.
I've stripped my class down to the following:
private var isAuthed = false
// I call this in a .task {} in my SwiftUI View
public func requestSpeechRecognizerPermission() {
SFSpeechRecognizer.requestAuthorization { authStatus in
Task {
self.isAuthed = authStatus == .authorized
}
}
}
public func transcribe(from url: URL) {
guard isAuthed else { return }
let locale = Locale(identifier: "en-US")
let recognizer = SFSpeechRecognizer(locale: locale)
let recognitionRequest = SFSpeechURLRecognitionRequest(url: url)
// the behaviour occurs whether I set this to true or not, I recently set
// it to true to see if it made a difference
recognizer?.supportsOnDeviceRecognition = true
recognitionRequest.shouldReportPartialResults = true
recognitionRequest.addsPunctuation = true
recognizer?.recognitionTask(with: recognitionRequest) { (result, error) in
guard result != nil else { return }
if result!.isFinal {
//speechRecognitionMetadata is not nil
} else {
//speechRecognitionMetadata is nil
}
}
}
}
Further, and this isn't documented either, the SFTranscriptionSegment values don't have correct timestamp and duration values until isFinal. The values aren't all zero, but they don't align with the timing in the audio and they change to accurate values when isFinal is true.
The transcription otherwise "works", in that I get transcription text before isFinal and if I wait for isFinal the segments are correct and speechRecognitionMetadata is filled with values.
The context here is I'm trying to generate a transcription that I can then highlight the spoken sections of as audio plays and I'm thinking I must be just trying to use the Speech framework in a way it does not work. I got my concept working if I pre-process the audio (i.e. run it through until isFinal and save the results I need to json), but being able to do even a rougher version of it 'on the fly' - which requires segments to have the right timestamp/duration before isFinal - is perhaps impossible?
Not able to record audio in AAC format with 96 kHz sample rate using AVAudioRecorder or Extended Audio File services with 96 kHz input audio from input device. The audio recording settings used are
let settings: [String: Any] = [
AVFormatIDKey: Int(kAudioFormatMPEG4AAC),
AVSampleRateKey: sampleRate
AVNumberOfChannelsKey: 1
AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
]
When tried using AVAudioEngine using AVAudioFile,
AVAudioFile(forWriting: fileURL, // file extension .m4a settings: fileSettings,
commonFormat: AVAudioCommonFormat.pcmFormatFloat32, interleaved: interleaved) else { return }
got error
CodecConverterFactory.cpp:977 unable to select compatible encoder sample rate
AudioConverter.cpp:1017 Failed to create a new in process converter -> from 1 ch, 96000 Hz, Float32 to 1 ch, 96000 Hz, aac (0x00000000) 0 bits/channel, 0 bytes/packet, 0 frames/packet, 0 bytes/frame, with status 1718449215
I ran 5.1 audio tests in both YouTube and Apple Music, and I noticed that when sound is supposed to play from the rear or front surround speakers, it’s also duplicated in the front left and right channels. I’m absolutely sure the issue is with the Apple TV, because I played the same video directly through my TV’s native system, and the channel separation was correct.
Everything used to work perfectly before, so this must be a software issue. I’m currently on tvOS 26 Developer Beta 5, but I’m certain the problem also existed on the stable tvOS 18.5.
I’ve already reset and updated my Apple TV, and I also tried switching the audio format to forced Dolby Atmos 5.1. On the forums, I mostly see complaints about Dolby Atmos not working at all — in my case, everything technically works, but not the way it’s supposed to.
Topic:
Media Technologies
SubTopic:
Audio
Hi,
I'm working on a project that uses the AVSpeechSynthesizer and AVSpeechUtterance.
I discovered by chance that the AVSpeechSynthesizer automatically completes some words instead of just outputting what it's supposed to.
These are abbreviations for days of the week or months, but not all of them. I don't want either of them automatically completed, but only the specified text. The completion transcends languages.
I have written a short example program for demonstration purposes.
import SwiftUI
import AVFoundation
import Foundation
let synthesizer: AVSpeechSynthesizer = AVSpeechSynthesizer()
struct ContentView: View {
var body: some View {
VStack {
Button {
utter("mon")
} label: {
Text("mon")
}
.buttonStyle(.borderedProminent)
Button {
utter("tue")
} label: {
Text("tue")
}
.buttonStyle(.borderedProminent)
Button {
utter("thu")
} label: {
Text("thu")
}
.buttonStyle(.borderedProminent)
Button {
utter("feb")
} label: {
Text("feb")
}
.buttonStyle(.borderedProminent)
Button {
utter("feb", lang: "de-DE")
} label: {
Text("feb DE")
}
.buttonStyle(.borderedProminent)
Button {
utter("wed")
} label: {
Text("wed")
}
.buttonStyle(.borderedProminent)
}
.padding()
}
private func utter(_ text: String, lang: String = "en-US") {
let utterance = AVSpeechUtterance(string: text)
let voice = AVSpeechSynthesisVoice(language: lang)
utterance.voice = voice
synthesizer.speak(utterance)
}
}
#Preview {
ContentView()
}
Thank you
Christian
I develop a application with an uvc camera, this camera is a webcam, I use the AVFoundation library ,but when I run the code "[self.mCaptureSession startRunning]" ,I can not get the buffer, I already set the delegate, any answer will help.
I'm using an AVAudioConverter object to decode an OPUS stream for VoIP. The decoding itself works well, however, whenever the stream stalls (no more audio packet is available to decode because of network instability) this can be heard in crackling / abrupt stop in decoded audio. OPUS can mitigate this by indicating packet loss by passing a null pointer in the C-library to
int opus_decode_float (OpusDecoder * st, const unsigned char * data, opus_int32 len, float * pcm, int frame_size, int decode_fec), see https://opus-codec.org/docs/opus_api-1.2/group__opus__decoder.html#ga9c554b8c0214e24733a299fe53bb3bd2.
However, with AVAudioConverter using Swift I'm constructing an AVAudioCompressedBuffer like so:
let compressedBuffer = AVAudioCompressedBuffer(
format: VoiceEncoder.Constants.networkFormat,
packetCapacity: 1,
maximumPacketSize: data.count
)
compressedBuffer.byteLength = UInt32(data.count)
compressedBuffer.packetCount = 1
compressedBuffer.packetDescriptions!
.pointee.mDataByteSize = UInt32(data.count)
data.copyBytes(
to: compressedBuffer.data
.assumingMemoryBound(to: UInt8.self),
count: data.count
)
where data: Data contains the raw OPUS frame to be decoded.
How can I specify data loss in this context and cause the AVAudioConverter to output PCM data whenever no more input data is available?
More context:
I'm specifying the audio format like this:
static let frameSize: UInt32 = 960
static let sampleRate: Float64 = 48000.0
static var networkFormatStreamDescription =
AudioStreamBasicDescription(
mSampleRate: sampleRate,
mFormatID: kAudioFormatOpus,
mFormatFlags: 0,
mBytesPerPacket: 0,
mFramesPerPacket: frameSize,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0
)
static let networkFormat =
AVAudioFormat(
streamDescription:
&networkFormatStreamDescription
)!
I've tried 1) setting byteLength and packetCount to zero and 2) returning nil but setting .haveData in the AVAudioConverterInputBlock I'm using with no success.
How does a third party developer go about supporting the new Enhanced Dialogue option for video apps in tvOS 18?
If an app is using the standard AVPlayerViewController, I had assumed it would be a simple-ish matter of building against the tvOS 18 SDK but apparently not, the options don't appear, not even greyed out.
Hi,
I am creating an app that can include videos or images in it's data. While
@Attribute(.externalStorage)
helps with images, with AVAssets I actually would like access to the URL behind that data. (as it would be stupid to load and then save the data again just to have a URL)
One key component is to keep all of this clean enough so that I can use (private) CloudKit syncing with the resulting model.
All the best
Christoph
I’ve been researching how to achieve a recording playback effect in iOS similar to the hands-free calling effect in the system’s phone app. How can this be implemented? I tried using the voice chat recording method, but found that the volume of the speaker output is too low. How should this issue be addressed? I couldn’t find a suitable API. Could you provide me with some documentation or sample code? Thank you.
Let's consider the following code.
I've created an actor that loads a list of .mp3 files from a Bundle and then makes it available for audio reproduction.
Unfortunately, I'm experiencing a memory leak.
At the play method.
player.play()
From Instruments I get
_malloc_type_malloc_outlined libsystem_malloc.dylib
start_wqthread libsystem_pthread.dylib
private actor AudioActor {
enum Failure: Error {
case soundsNotLoaded([AudioPlayerClient.Sound: Error])
}
enum Player {
case music(AVAudioPlayer)
}
var players: [Sound: Player] = [:]
let bundles: [Bundle]
init(bundles: UncheckedSendable<[Bundle]>) {
self.bundles = bundles.wrappedValue
}
func load(sounds: [Sound]) throws {
try AVAudioSession.sharedInstance().setActive(true, options: [])
var errors: [Sound: Error] = [:]
for sound in sounds {
guard let url = bundle.url(forResource: sound.name, withExtension: "mp3")
else { continue }
do {
self.players[sound] = try .music(AVAudioPlayer(contentsOf: url))
} catch {
errors[sound] = error
}
}
guard errors.isEmpty
else { throw Failure.soundsNotLoaded(errors) }
}
func play(sound: Sound, loops: Int?) throws {
guard let player = self.players[sound]
else { return }
switch player {
case let .music(player):
player.numberOfLoops = loops ?? -1
player.play()
}
}
func stop(sound: Sound) throws {
guard let player = self.players[sound]
else { throw Failure.soundsNotLoaded([:]) }
switch player {
case let .music(player):
player.stop()
}
}
}
I found that the aggregated device correctly obtains input channels in the standard microphone mode. However, in voice isolation mode, it only retrieves channels from the first sub-device in the aggregated device's list. If I want to properly obtain channel information in voice isolation mode, how should I do it?
When using the Apple Devices to sync Apple Music to iPhone where is the Apple Devices backup being written to?
Apple Devices->music->sync.
Not trying to backup the iPhone via Apple Devices app.