Hi i'm curently crating a model to identify car plates (object detection) i use asitop to monitor my macbook pro and i see that only the cpu is used for the training and i wanted to know why
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I’m keep looking around documentation and some sample codes but still haven’t found example of how was used this type of Network Regressor .
Does it take some special parameters to perform on ANE , what size,format of DataFrame ?
Hello Apple Developer Community,
I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays.
Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system?
If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached.
Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!
Hello. I am willing to hire game developer for cards game called baloot. My question is Can the developer implement an AI when the computer is playing and the computer on the same time the conputer improves his rises level without any interaction?
🌹
Topic:
Machine Learning & AI
SubTopic:
General
the specific context is that i would like to build an agent that monitors my phone call (with a customer support for example), and simiply identify whether or not im still put on hold, and notify me when im not.
currently after reading the doc, i dont think its possible yet, but im so annoyed by the customer support calls that im willing to go the distance and see if theres any way.
Is it possible to train a model using CreateML to infer a relevance numeric score of a news article based on similar trained data, something like a sentiment score ? I created a Text Classifier that assigns a category label which works perfect but I would like a solution that calculates a numeric value, not a label.
Topic:
Machine Learning & AI
SubTopic:
Create ML
Hello,
I have a question regarding hybrid execution for deep learning models on Apple's Neural Engine and CPU. I am aware that setting the precision of some layers to 32-bit allows hybrid execution across both the Neural Engine and the CPU. However, I would like to know if it is possible to achieve the same with 16-bit precision.
Is there any specific configuration or workaround to enable hybrid execution in this case? Any guidance or documentation references would be greatly appreciated.
Thank you!
Topic:
Machine Learning & AI
SubTopic:
Core ML
I've built a model using Create ML, but I can't make it, for the love of God, updatable. I can't find any checkbox or anything related. It's an Activity Classifier, if it matters.
I want to continue training it on-device using MLUpdateTask, but the model, as exported from Create ML, fails with error: Domain=com.apple.CoreML Code=6 "Failed to unarchive update parameters. Model should be re-compiled." UserInfo={NSLocalizedDescription=Failed to unarchive update parameters. Model should be re-compiled.}
Hi, I'm currently using Metal Performance Shaders Graph (MPSGraphExecutable) to run neural network inference operations as part of a metal rendering pipeline.
I also tried to profile the usage of neural engine when running inference using MPSGraphExecutable but the graph shows no sign of neural engine usage. However, when I used the coreML model inspection tool in xcode and run performance report, it was able to use ANE.
Does MPSGraphExecutable automatically utilize the Apple Neural Engine (ANE) when running inference operations, or does it only execute on GPU?
My model (Core ML Package) was converted from a pytouch model using coremltools with ML program type and support iOS17.0+.
Any insights or documentation references would be greatly appreciated!
The developer tutorial for visual intelligence indicates that the method to detect and handle taps on a displayed entity from the Search section is via an "OpenIntent" associated with your entity.
However, running this intent executes code from within my app. If I have the perform() method display UI, it always displays UI from within my app.
I noticed that the Google app's integration to visual intelligence has a different behavior-- tapping on an entity does not take you to the Google app -- instead, a Webview is presented sheet-style WITHIN the Visual Intelligence environment (see below)
How is that accomplished?
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
JAX Metal shows 55x slower random number generation compared to NVIDIA CUDA on equivalent workloads. This makes Monte Carlo simulations and scientific computing impractical on Apple Silicon.
Performance Comparison
NVIDIA GPU: 0.475s for 12.6M random elements
M1 Max Metal: 26.3s for same workload
Performance gap: 55x slower
Environment
Apple M1 Max, 64GB RAM, macOS Sequoia Version 15.6.1
JAX 0.4.34, jax-metal latest
Backend: Metal
Reproduction Code
import time
import jax
import jax.numpy as jnp
from jax import random
key = random.PRNGKey(42)
start_time = time.time()
random_array = random.normal(key, (50000, 252))
duration = time.time() - start_time
print(f"Duration: {duration:.3f}s")
I'm running MacOs 26 Beta 5. I noticed that I can no longer achieve 100% usage on the ANE as I could before with Apple Foundations on-device model. Has Apple activated some kind of throttling or power limiting of the ANE? I cannot get above 3w or 40% usage now since upgrading. I'm on the high power energy mode. I there an API rate limit being applied?
I kave a M4 Pro mini with 64 GB of memory.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
WWDC25: Combine Metal 4 machine learning and graphics
Demonstrated a way to combine neural network in the graphics pipeline directly through the shaders, using an example of Texture Compression. However there is no mention of using which ML technique texture is compressed.
Can anyone point me to some well known model/s for this particular use case shown in WWDC25.
When using CoreML for VAE model prediction, the prediction result shows a distorted display with no error messages. How can this issue be addressed?
We are using VNRecognizeTextRequest to detect text in documents, and we have noticed that even in some very clear and well-formatted documents, there are still instances where text blocks are missed. the live text also have the same issue.
Hi everyone,
I’m an AI engineer working on autonomous AI agents and exploring ways to integrate them into the Apple ecosystem, especially via Siri and Apple Intelligence.
I was impressed by Apple’s integration of ChatGPT and its privacy-first design, but I’m curious to know:
• Are there plans to support third-party LLMs?
• Could Siri or Apple Intelligence call external AI agents or allow extensions to plug in alternative models for reasoning, scheduling, or proactive suggestions?
I’m particularly interested in building event-driven, voice-triggered workflows where Apple Intelligence could act as a front-end for more complex autonomous systems (possibly local or cloud-based).
This kind of extensibility would open up incredible opportunities for personalized, privacy-friendly use cases — while aligning with Apple’s system architecture.
Is anything like this on the roadmap? Or is there a suggested way to prototype such integrations today?
Thanks in advance for any thoughts or pointers!
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Tags:
SiriKit
Machine Learning
Apple Intelligence
Following WWDC24 video "Discover Swift enhancements in the Vision framework" recommendations (cfr video at 10'41"), I used the following code to perform multiple new iOS 18 `RecognizedTextRequest' in parallel.
Problem: if more than 2 request are run in parallel, the request will hang, leaving the app in a state where no more requests can be started. -> deadlock
I tried other ways to run the requests, but no matter the method employed, or what device I use: no more than 2 requests can ever be run in parallel.
func triggerDeadlock() {}
try await withThrowingTaskGroup(of: Void.self) { group in
// See: WWDC 2024 Discover Siwft enhancements in the Vision framework at 10:41
// ############## THIS IS KEY
let maxOCRTasks = 5 // On a real-device, if more than 2 RecognizeTextRequest are launched in parallel using tasks, the request hangs
// ############## THIS IS KEY
for idx in 0..<maxOCRTasks {
let url = ... // URL to some image
group.addTask {
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
}
var nextIndex = maxOCRTasks
for try await _ in group { // Wait for the result of the next child task that finished
if nextIndex < pageCount {
group.addTask {
let url = ... // URL to some image
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
nextIndex += 1
}
}
}
}
// MARK: - ASYNC/AWAIT version with iOS 18
@available(iOS 18, *)
func performOCRRequest(on url: URL) async throws -> [RecognizedText] {
// Create request
var request = RecognizeTextRequest() // Single request: no need for ImageRequestHandler
// Configure request
request.recognitionLevel = .accurate
request.automaticallyDetectsLanguage = true
request.usesLanguageCorrection = true
request.minimumTextHeightFraction = 0.016
// Perform request
let textObservations: [RecognizedTextObservation] = try await request.perform(on: url)
// Convert [RecognizedTextObservation] to [RecognizedText]
return textObservations.compactMap { observation in
observation.topCandidates(1).first
}
}
I also found this Swift forums post mentioning something very similar.
I also opened a feedback: FB17240843
Hello Apple Developer Community,
I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays.
Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system?
If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached.
Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!
Is it possible to expose a custom VirtIO device to a Linux guest running inside a VM — likely using QEMU backed by Hypervisor.framework. The guest would see this device as something like /dev/npu0, and it would use a kernel driver + userspace library to submit inference requests.
On the macOS host, these requests would be executed using CoreML, MPSGraph, or BNNS. The results would be passed back to the guest via IPC.
Does the macOS allow this kind of "fake" NPU / GPU
Hello,
I was successfully able to compile TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML using Core ML, and it's working well. However, I’m now trying to compile the same model using Swift Transformers.
With the limited documentation available on the swift-chat and Hugging Face repositories, I’m finding it difficult to understand the correct process for compiling a model via Swift Transformers. I attempted the following approach, but I’m fairly certain it’s not the recommended or correct method.
Could someone guide me on the proper way to compile and use models like TinyLlama with Swift Transformers? Any official workflow, example, or best practice would be very helpful.
Thanks in advance!
This is the approach I have used:
import Foundation
import CoreML
import Tokenizers
@main
struct HopeApp {
static func main() async {
print(" Running custom decoder loop...")
do {
let tokenizer = try await AutoTokenizer.from(pretrained: "PY007/TinyLlama-1.1B-Chat-v0.3")
var inputIds = tokenizer("this is the test of the prompt")
print("🧠 Prompt token IDs:", inputIds)
let model = try float16_model(configuration: .init())
let maxTokens = 30
for _ in 0..<maxTokens {
let input = try MLMultiArray(shape: [1, 128], dataType: .int32)
let mask = try MLMultiArray(shape: [1, 128], dataType: .int32)
for i in 0..<inputIds.count {
input[i] = NSNumber(value: inputIds[i])
mask[i] = 1
}
for i in inputIds.count..<128 {
input[i] = 0
mask[i] = 0
}
let output = try model.prediction(input_ids: input, attention_mask: mask)
let logits = output.logits // shape: [1, seqLen, vocabSize]
let lastIndex = inputIds.count - 1
let lastLogitsStart = lastIndex * 32003 // vocab size = 32003
var nextToken = 0
var maxLogit: Float32 = -Float.greatestFiniteMagnitude
for i in 0..<32003 {
let logit = logits[lastLogitsStart + i].floatValue
if logit > maxLogit {
maxLogit = logit
nextToken = i
}
}
inputIds.append(nextToken)
if nextToken == 32002 { break }
let partialText = try await tokenizer.decode(tokens:inputIds)
print(partialText)
}
} catch {
print("❌ Error: \(error)")
}
}
}
Topic:
Machine Learning & AI
SubTopic:
Core ML