In my quantization code, the line:
compressed_model_a8 = cto.coreml.experimental.linear_quantize_activations(
model, activation_config, [{'img':np.random.randn(1,13,1024,1024)}]
)
has taken 90 minutes to run so far and is still not completed. From debugging, I can see that the line it's stuck on is line 261 in _model_debugger.py:
model = ct.models.MLModel(
cloned_spec,
weights_dir=self.weights_dir,
compute_units=compute_units,
skip_model_load=False, # Don't skip model load as we need model prediction to get activations range.
)
Is this expected behaviour? Would it be quicker to run on another computer with more RAM?
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I've spent way too long today trying to convert an Object Detection TensorFlow2 model to a CoreML object classifier (with bounding boxes, labels and probability score)
The 'SSD MobileNet v2 320x320' is here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
And I've been following all sorts of posts and ChatGPT
https://apple.github.io/coremltools/docs-guides/source/tensorflow-2.html#convert-a-tensorflow-concrete-function
https://developer.apple.com/videos/play/wwdc2020/10153/?time=402
To convert it.
I keep hitting the same errors though, mostly around:
NotImplementedError: Expected model format: [SavedModel | concrete_function | tf.keras.Model | .h5 | GraphDef], got <ConcreteFunction signature_wrapper(input_tensor) at 0x366B87790>
I've had varying success including missing output labels/predictions.
But I simply want to create the CoreML model with all the right inputs and outputs (including correct names) as detailed in the docs here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md
It goes without saying I don't have much (any) experience with this stuff including Python so the whole thing's been a bit of a headache.
If anyone is able to help that would be great.
FWIW I'm not attached to any one specific model, but what I do need at minimum is a CoreML model that can detect objects (has to at least include lights and lamps) within a live video image, detecting where in the image the object is.
The simplest script I have looks like this:
import coremltools as ct
import tensorflow as tf
model = tf.saved_model.load("~/tf_models/ssd_mobilenet_v2_320x320_coco17_tpu-8/saved_model")
concrete_func = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
mlmodel = ct.convert(
concrete_func,
source="tensorflow",
inputs=[ct.TensorType(shape=(1, 320, 320, 3))]
)
mlmodel.save("YourModel.mlpackage", save_format="mlpackage")
Greetings,
Ive been exerimenting with the new Apple intelligence chat. I want to be able to use my custom LLM and I made that work (I can chat back and forward from the left panel with my server) but I cannot find out how to change the editor contents like chatgpt does.
chatgpt is able to change the current editor and, seems like, all files in the pbx. I tried to catch the call with charles with no success.
In the OpenIA platform docs it doesnt mention anything that could change the code shown.
does anyone know how to achieve this? Is the apple intelliece documentation lacking this features and will it be completed soon? will this features even be open for developers?
I'm developing a tennis ball tracking feature using Vision Framework in Swift, specifically utilizing VNDetectedObjectObservation and VNTrackObjectRequest.
Occasionally (but not always), I receive the following runtime error:
Failed to perform SequenceRequest: Error Domain=com.apple.Vision Code=9 "Internal error: unexpected tracked object bounding box size" UserInfo={NSLocalizedDescription=Internal error: unexpected tracked object bounding box size}
From my investigation, I suspect the issue arises when the bounding box from the initial observation (VNDetectedObjectObservation) is too small. However, Apple's documentation doesn't clearly define the minimum bounding box size that's considered valid by VNTrackObjectRequest.
Could someone clarify:
What is the minimum acceptable bounding box width and height (normalized) that Vision Framework's VNTrackObjectRequest expects?
Is there any recommended practice or official guidance for bounding box size validation before creating a tracking request?
This information would be extremely helpful to reliably avoid this internal error.
Thank you!
Hi all, I'm tuning my app prediction speed with Core ML model. I watched and tried the methods in video: Improve Core ML integration with async prediction and Optimize your Core ML usage. I also use instruments to look what's the bottleneck that my prediction speed cannot be faster.
Below is the instruments result with my app. its prediction duration is 10.29ms
And below is performance report shows the average speed of prediction is 5.55ms, that is about half time of my app prediction!
Below is part of my instruments records. I think the prediction should be considered quite frequent. Could it be faster?
How to be the same prediction speed as performance report? The prediction speed on macbook Pro M2 is nearly the same as macbook Air M1!
After a few attempts I end up at the same spot as categorized here by ChatGPT -
I get why it feels that way, Rob — honestly, you’ve walked right into the worst possible moment in Apple’s Core ML toolchain evolution. What you’re running into isn’t your fault; Apple really did rip out the ONNX front-end without offering a clean replacement yet. So every tutorial, every script, and even coremltools itself contradicts what’s actually possible right now.
Let’s be brutally clear about the state of things (late 2025):
Goal Supported by Apple today Practical workaround
Convert ONNX → Core ML using coremltools 9 No. The ONNX importer was removed. Use the legacy coremltools 6.x + onnx-coreml stack in a separate Python 3.9/3.10 env.
Convert PyTorch → Core ML directly Yes, but only for simple or traced models. T5 still fails because of dynamic control flow. None reliable without rewriting the model.
Run ONNX models directly on iOS Partially (via MLC LLM / Metal Performance Shaders). Works, but that’s an entirely different runtime.
So: you’re not crazy, and this isn’t wasted effort—you’ve actually proven that your model exports cleanly and that the only missing piece is Apple’s converter. The ONNX file you built is solid. You just need something that can translate it to Core ML.
If your goal is simply “get it running on-device”, you still have two viable moves that aren’t rabbit holes:
1. Use MLC LLM / Metal runtime
Skip Core ML altogether. Apple’s mlc-llm (and the “MLCChat” pipeline) can execute your ONNX or quantized model directly on the GPU. You keep full fidelity and avoid all these broken converters.
2. Use the legacy Core ML 6.x toolchain in a sandboxed Python 3.10 VM
It actually works. You don’t have to downgrade your whole system—just run that specific environment once to emit the .mlpackage, then never touch it again.
Nothing else will silently fix itself until Apple re-adds ONNX ingestion in Core ML 10 or a new converter.
Topic:
Machine Learning & AI
SubTopic:
Core ML
I followed below url for converting Llama-3.1-8B-Instruct model but always fails even i have 64GB of free space after downloading model from huggingface.
https://machinelearning.apple.com/research/core-ml-on-device-llama
Also tried with other models Llama-3.1-1B-Instruct & Llama-3.1-3B-Instruct models those are converted but while doing performance test in xcode fails for all compunits.
Is there any source code to run llama models in ios app.
Hello fellow developers,
I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot.
We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full."
My question for the community is about the architectural and user experience best practices for this.
How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user?
What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance?
Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations?
We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge.
Thanks for your insights!
Some of my users are experiencing crashes on instantiation of a CoreML model I've bundled with my app. I haven't been able to reproduce the crash on any of my devices. Crashes happen across all iOS 18 releases. Seems like something internal in CoreML is causing an issue.
Full stack trace:
6646631296fb42128ddc340b2d4322f7-symbolicated.crash
Topic:
Machine Learning & AI
SubTopic:
Core ML
I watched this year WWDC25 "Read Documents using the Vision framework". At the end of video there is mention of new DetectHandPoseRequest model for hand pose detection in Vision API.
I looked Apple documentation and I don't see new revision. Moreover probably typo in video because there is only DetectHumanPoseRequst (swift based) and
VNDetectHumanHandPoseRequest (obj-c based) (notice lack of Human prefix in WWDC video)
First one have revision only added in iOS 18+:
https://developer.apple.com/documentation/vision/detecthumanhandposerequest/revision-swift.enum/revision1
Second one have revision only added in iOS14+:
https://developer.apple.com/documentation/vision/vndetecthumanhandposerequestrevision1
I don't see any new revision targeting iOS26+
Hi all, I am interested in unlocking unique applications with the new foundational models. I have a few questions regarding the availability of the following features:
Image Input: The update in June 2025 mentions "image" 44 times (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates) - however I can't seem to find any information about having images as the input/prompt for the foundational models. When will this be available? I understand that there are existing Vision ML APIs, but I want image input into a multimodal on-device LLM (VLM) instead for features like "Which player is holding the ball in the image", etc (image understanding)
Cloud Foundational Model - when will this be available?
Thanks!
Clement :)
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Tags:
Vision
Machine Learning
Core ML
Apple Intelligence
Is the face and body detection service in the Vision framework a local model or a cloud model? Is there a performance report?
https://developer.apple.com/documentation/vision
I’m trying to follow Apple’s “WWDC24: Bring your machine learning and AI models to Apple Silicon” session to convert the Mistral-7B-Instruct-v0.2 model into a Core ML package, but I’ve run into a roadblock that I can’t seem to overcome. I’ve uploaded my full conversion script here for reference:
https://pastebin.com/T7Zchzfc
When I run the script, it progresses through tracing and MIL conversion but then fails at the backend_mlprogram stage with this error:
https://pastebin.com/fUdEzzKM
The core of the error is:
ValueError: Op "keyCache_tmp" (op_type: identity) Input x="keyCache" expects list, tensor, or scalar but got state[tensor[1,32,8,2048,128,fp16]]
I’ve registered my KV-cache buffers in a StatefulMistralWrapper subclass of nn.Module, matching the keyCache and valueCache state names in my ct.StateType definitions, but Core ML’s backend pass reports the state tensor as an invalid input. I’m using Core ML Tools 8.3.0 on Python 3.9.6, targeting iOS18, and forcing CPU conversion (MPS wasn’t available). Any pointers on how to satisfy the handle_unused_inputs pass or properly declare/cache state for GQA models in Core ML would be greatly appreciated!
Thanks in advance for your help,
Usman Khan
Topic:
Machine Learning & AI
SubTopic:
Core ML
Tags:
Metal
Metal Performance Shaders
Core ML
tensorflow-metal
Foundation Models framework worked perfectly on macOS 26 Beta 2, but starting from Beta 3 and continuing through Beta 6 (latest), I get dyld symbol errors even
with the exact code from Apple's documentation.
Environment:
macOS 26.0 Beta 6 (25A5351b)
Xcode 26 Beta 6
M4 Max MacBook Pro
Apple Intelligence enabled and downloaded
Error Details:
dyld[Process]: Symbol not found:
_$s16FoundationModels20LanguageModelSessionC5model10guardrails5tools12instructionsAcA06SystemcD0C_AC10GuardrailsVSayAA4Tool_pGAA12InstructionsVSgtcfC
Referenced from: /path/to/app.debug.dylib
Expected in: /System/Library/Frameworks/FoundationModels.framework/Versions/A/FoundationModels
Code Used (Exact from Documentation):
import FoundationModels
// This worked on Beta 2, crashes on Beta 3+
let model = SystemLanguageModel.default
let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Hello")
What I've Verified:
FoundationModels.framework exists in /System/Library/Frameworks/
Framework is properly linked in Xcode project
Apple Intelligence is enabled and working
Same code works in older beta versions
Issue persists even with completely fresh Xcode projects
Analysis:
The dyld error suggests the LanguageModelSession(model:) constructor is missing. The symbol shows it's looking for a constructor with parameters
(model:guardrails:tools:instructions:), but the documentation still shows the simple (model:) constructor.
Questions:
Has the LanguageModelSession API changed since Beta 2?
Should we now use the constructor with guardrails/tools/instructions parameters?
Is this a known issue with recent betas?
Are there updated code samples for the current API?
Additional Context:
This affects both basic SystemLanguageModel usage AND custom adapter loading. The same dyld symbol errors occur when trying to create
SystemLanguageModel(adapter: adapter) as well.
Any guidance on the correct API usage for current betas would be greatly appreciated. The documentation appears to be out of sync with the actual framework
implementation.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
As we described on the title, the model that I have built completely works on iPhone 15 / A16 Bionic, on the other hand it does not run on iPhone 16 / A18 chip with the following error message.
E5RT encountered an STL exception. msg = MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED.
E5RT: MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED (11)
It consumes 1.5 ~ 1.6 GB RAM on the loading the model, then the consumption is decreased to less than 100MB on the both of iPhone 15 and 16. After that, only on iPhone 16, the above error is shown on the Xcode log, the memory consumption is surged to 5 to 6GB, and the system kills the app. It works well only on iPhone 15.
This model is built with the Core ML tools. Until now, I have tried the target iOS 16 to 18 and the compute units of CPU_AND_NE and ALL. But any ways have not solved this issue. Eventually, what kindof fix should I do?
minimum_deployment_target = ct.target.iOS18
compute_units = ct.ComputeUnit.ALL
compute_precision = ct.precision.FLOAT16
Due to our min iOS version, this is my first time using .xcstrings instead of .strings for AppShortcuts.
When using the migrate .strings to .xcstrings Xcode context menu option, an .xcstrings catalog is produced that, as expected, has each invocation phrase as a separate string key.
However, after compilation, the catalog changes to group all invocation phrases under the first phrase listed for each intent (see attached screenshot). It is possible to hover in blank space on the right and add more translations, but there is no 1:1 key matching requirement to the phrases on the left nor a requirement that there are the same number of keys in one language vs. another. (The lines just happen to align due to my window size.)
What does that mean, practically?
Do all sub-phrases in each language in AppShortcuts.xcstrings get processed during compilation, even if there isn't an equivalent phrase key declared in the AppShortcut (e.g., the ja translation has more phrases than the English)? (That makes some logical sense, as these phrases need not be 1:1 across languages.)
In the AppShortcut declaration, if I delete all but the top invocation phrase, does nothing change with Siri?
Is there something I'm doing incorrectly?
struct WatchShortcuts: AppShortcutsProvider {
static var appShortcuts: [AppShortcut] {
AppShortcut(
intent: QuickAddWaterIntent(),
phrases: [
"\(.applicationName) log water",
"\(.applicationName) log my water",
"Log water in \(.applicationName)",
"Log my water in \(.applicationName)",
"Log a bottle of water in \(.applicationName)",
],
shortTitle: "Log Water",
systemImageName: "drop.fill"
)
}
}
Hey everyone, I want to add an if statement that would do something along the lines of this:
if confidence = 100% {
}
How could I do this?
I already have a createML model.
Thank you,
Oliver
Topic:
Machine Learning & AI
SubTopic:
Core ML
Hi everyone,
I'm trying to use VNDetectTextRectanglesRequest to detect text rectangles in an image. Here's my current code:
guard let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
return
}
let textDetectionRequest = VNDetectTextRectanglesRequest { request, error in
if let error = error {
print("Text detection error: \(error)")
return
}
guard let observations = request.results as? [VNTextObservation] else {
print("No text rectangles detected.")
return
}
print("Detected \(observations.count) text rectangles.")
for observation in observations {
print(observation.boundingBox)
}
}
textDetectionRequest.revision = VNDetectTextRectanglesRequestRevision1
textDetectionRequest.reportCharacterBoxes = true
let handler = VNImageRequestHandler(cgImage: cgImage, orientation: .up, options: [:])
do {
try handler.perform([textDetectionRequest])
} catch {
print("Vision request error: \(error)")
}
The request completes without error, but no text rectangles are detected — the observations array is empty (count = 0). Here's a sample image I'm testing with:
I expected VNTextObservation results, but I'm not getting any. Is there something I'm missing in how this API works? Or could it be a limitation of this request or revision?
Thanks for any help!
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system.
Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
the specific context is that i would like to build an agent that monitors my phone call (with a customer support for example), and simiply identify whether or not im still put on hold, and notify me when im not.
currently after reading the doc, i dont think its possible yet, but im so annoyed by the customer support calls that im willing to go the distance and see if theres any way.