Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

Is there an API to check if a Core ML compiled model is already cached?
Hello Apple Developer Community, I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays. Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system? If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached. Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!
0
0
122
May ’25
AI and ML
Hello. I am willing to hire game developer for cards game called baloot. My question is Can the developer implement an AI when the computer is playing and the computer on the same time the conputer improves his rises level without any interaction? 🌹
0
0
84
Jun ’25
is it possible to let siri monitor phone calls, and notify me when a certain trigger happens?
the specific context is that i would like to build an agent that monitors my phone call (with a customer support for example), and simiply identify whether or not im still put on hold, and notify me when im not. currently after reading the doc, i dont think its possible yet, but im so annoyed by the customer support calls that im willing to go the distance and see if theres any way.
0
0
128
Jun ’25
CoreML model for news scoring
Is it possible to train a model using CreateML to infer a relevance numeric score of a news article based on similar trained data, something like a sentiment score ? I created a Text Classifier that assigns a category label which works perfect but I would like a solution that calculates a numeric value, not a label.
2
0
103
Mar ’25
Can I Perform Hybrid Execution on Neural Engine and CPU with 16-bit Precision?
Hello, I have a question regarding hybrid execution for deep learning models on Apple's Neural Engine and CPU. I am aware that setting the precision of some layers to 32-bit allows hybrid execution across both the Neural Engine and the CPU. However, I would like to know if it is possible to achieve the same with 16-bit precision. Is there any specific configuration or workaround to enable hybrid execution in this case? Any guidance or documentation references would be greatly appreciated. Thank you!
0
0
437
Jan ’25
How to create updatable models using Create ML app
I've built a model using Create ML, but I can't make it, for the love of God, updatable. I can't find any checkbox or anything related. It's an Activity Classifier, if it matters. I want to continue training it on-device using MLUpdateTask, but the model, as exported from Create ML, fails with error: Domain=com.apple.CoreML Code=6 "Failed to unarchive update parameters. Model should be re-compiled." UserInfo={NSLocalizedDescription=Failed to unarchive update parameters. Model should be re-compiled.}
0
0
255
Nov ’25
Can MPSGraphExecutable automatically leverage Apple Neural Engine (ANE) for inference?
Hi, I'm currently using Metal Performance Shaders Graph (MPSGraphExecutable) to run neural network inference operations as part of a metal rendering pipeline. I also tried to profile the usage of neural engine when running inference using MPSGraphExecutable but the graph shows no sign of neural engine usage. However, when I used the coreML model inspection tool in xcode and run performance report, it was able to use ANE. Does MPSGraphExecutable automatically utilize the Apple Neural Engine (ANE) when running inference operations, or does it only execute on GPU? My model (Core ML Package) was converted from a pytouch model using coremltools with ML program type and support iOS17.0+. Any insights or documentation references would be greatly appreciated!
0
0
358
Nov ’25
Visual Intelligence -- Make OpenIntent show a sheet rather than open my App
The developer tutorial for visual intelligence indicates that the method to detect and handle taps on a displayed entity from the Search section is via an "OpenIntent" associated with your entity. However, running this intent executes code from within my app. If I have the perform() method display UI, it always displays UI from within my app. I noticed that the Google app's integration to visual intelligence has a different behavior-- tapping on an entity does not take you to the Google app -- instead, a Webview is presented sheet-style WITHIN the Visual Intelligence environment (see below) How is that accomplished?
0
0
548
Sep ’25
JAX Metal: Random Number Generation Performance Issue on M1 Max
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon. Performance Comparison NVIDIA GPU: 0.475s for 12.6M random elements M1 Max Metal: 26.3s for same workload Performance gap: 55x slower Environment Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1 JAX 0.4.34, jax-metal latest Backend: Metal Reproduction Code import time import jax import jax.numpy as jnp from jax import random key = random.PRNGKey(42) start_time = time.time() random_array = random.normal(key, (50000, 252)) duration = time.time() - start_time print(f"Duration: {duration:.3f}s")
0
0
342
Aug ’25
ANE Performance for on-device Foundation model
I'm running MacOs 26 Beta 5. I noticed that I can no longer achieve 100% usage on the ANE as I could before with Apple Foundations on-device model. Has Apple activated some kind of throttling or power limiting of the ANE? I cannot get above 3w or 40% usage now since upgrading. I'm on the high power energy mode. I there an API rate limit being applied? I kave a M4 Pro mini with 64 GB of memory.
0
0
321
Aug ’25
WWDC25 combining metal and ML
WWDC25: Combine Metal 4 machine learning and graphics Demonstrated a way to combine neural network in the graphics pipeline directly through the shaders, using an example of Texture Compression. However there is no mention of using which ML technique texture is compressed. Can anyone point me to some well known model/s for this particular use case shown in WWDC25.
2
0
427
Jul ’25
Will Apple Intelligence Support Third-Party LLMs or Custom AI Agent Integrations?
Hi everyone, I’m an AI engineer working on autonomous AI agents and exploring ways to integrate them into the Apple ecosystem, especially via Siri and Apple Intelligence. I was impressed by Apple’s integration of ChatGPT and its privacy-first design, but I’m curious to know: • Are there plans to support third-party LLMs? • Could Siri or Apple Intelligence call external AI agents or allow extensions to plug in alternative models for reasoning, scheduling, or proactive suggestions? I’m particularly interested in building event-driven, voice-triggered workflows where Apple Intelligence could act as a front-end for more complex autonomous systems (possibly local or cloud-based). This kind of extensibility would open up incredible opportunities for personalized, privacy-friendly use cases — while aligning with Apple’s system architecture. Is anything like this on the roadmap? Or is there a suggested way to prototype such integrations today? Thanks in advance for any thoughts or pointers!
4
0
457
May ’25
iOS 18 new RecognizedTextRequest DEADLOCKS if more than 2 are run in parallel
Following WWDC24 video "Discover Swift enhancements in the Vision framework" recommendations (cfr video at 10'41"), I used the following code to perform multiple new iOS 18 `RecognizedTextRequest' in parallel. Problem: if more than 2 request are run in parallel, the request will hang, leaving the app in a state where no more requests can be started. -> deadlock I tried other ways to run the requests, but no matter the method employed, or what device I use: no more than 2 requests can ever be run in parallel. func triggerDeadlock() {} try await withThrowingTaskGroup(of: Void.self) { group in // See: WWDC 2024 Discover Siwft enhancements in the Vision framework at 10:41 // ############## THIS IS KEY let maxOCRTasks = 5 // On a real-device, if more than 2 RecognizeTextRequest are launched in parallel using tasks, the request hangs // ############## THIS IS KEY for idx in 0..<maxOCRTasks { let url = ... // URL to some image group.addTask { // Perform OCR let _ = await performOCRRequest(on: url: url) } } var nextIndex = maxOCRTasks for try await _ in group { // Wait for the result of the next child task that finished if nextIndex < pageCount { group.addTask { let url = ... // URL to some image // Perform OCR let _ = await performOCRRequest(on: url: url) } nextIndex += 1 } } } } // MARK: - ASYNC/AWAIT version with iOS 18 @available(iOS 18, *) func performOCRRequest(on url: URL) async throws -> [RecognizedText] { // Create request var request = RecognizeTextRequest() // Single request: no need for ImageRequestHandler // Configure request request.recognitionLevel = .accurate request.automaticallyDetectsLanguage = true request.usesLanguageCorrection = true request.minimumTextHeightFraction = 0.016 // Perform request let textObservations: [RecognizedTextObservation] = try await request.perform(on: url) // Convert [RecognizedTextObservation] to [RecognizedText] return textObservations.compactMap { observation in observation.topCandidates(1).first } } I also found this Swift forums post mentioning something very similar. I also opened a feedback: FB17240843
7
0
242
Aug ’25
Is there an API to check if a Core ML compiled model is already cached?
Hello Apple Developer Community, I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays. Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system? If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached. Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!
2
0
220
May ’25
Is it possible to create a virtual NPU device on macOS using Hypervisor.framework + CoreML?
Is it possible to expose a custom VirtIO device to a Linux guest running inside a VM — likely using QEMU backed by Hypervisor.framework. The guest would see this device as something like /dev/npu0, and it would use a kernel driver + userspace library to submit inference requests. On the macOS host, these requests would be executed using CoreML, MPSGraph, or BNNS. The results would be passed back to the guest via IPC. Does the macOS allow this kind of "fake" NPU / GPU
1
0
375
Aug ’25
Difference between compiling a Model using CoreML and Swift-Transformers
Hello, I was successfully able to compile TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML using Core ML, and it's working well. However, I’m now trying to compile the same model using Swift Transformers. With the limited documentation available on the swift-chat and Hugging Face repositories, I’m finding it difficult to understand the correct process for compiling a model via Swift Transformers. I attempted the following approach, but I’m fairly certain it’s not the recommended or correct method. Could someone guide me on the proper way to compile and use models like TinyLlama with Swift Transformers? Any official workflow, example, or best practice would be very helpful. Thanks in advance! This is the approach I have used: import Foundation import CoreML import Tokenizers @main struct HopeApp { static func main() async { print(" Running custom decoder loop...") do { let tokenizer = try await AutoTokenizer.from(pretrained: "PY007/TinyLlama-1.1B-Chat-v0.3") var inputIds = tokenizer("this is the test of the prompt") print("🧠 Prompt token IDs:", inputIds) let model = try float16_model(configuration: .init()) let maxTokens = 30 for _ in 0..<maxTokens { let input = try MLMultiArray(shape: [1, 128], dataType: .int32) let mask = try MLMultiArray(shape: [1, 128], dataType: .int32) for i in 0..<inputIds.count { input[i] = NSNumber(value: inputIds[i]) mask[i] = 1 } for i in inputIds.count..<128 { input[i] = 0 mask[i] = 0 } let output = try model.prediction(input_ids: input, attention_mask: mask) let logits = output.logits // shape: [1, seqLen, vocabSize] let lastIndex = inputIds.count - 1 let lastLogitsStart = lastIndex * 32003 // vocab size = 32003 var nextToken = 0 var maxLogit: Float32 = -Float.greatestFiniteMagnitude for i in 0..<32003 { let logit = logits[lastLogitsStart + i].floatValue if logit > maxLogit { maxLogit = logit nextToken = i } } inputIds.append(nextToken) if nextToken == 32002 { break } let partialText = try await tokenizer.decode(tokens:inputIds) print(partialText) } } catch { print("❌ Error: \(error)") } } }
1
0
163
Jun ’25