Problem Description:
I am developing an application that runs in the Shared Space on Apple Vision Pro using Unity. When using the UI ScrollView (Scroll View) component, I found that the Mask / RectMask2D does not function in the Shared Space.
Scrolling content is not masked or cropped; it extends beyond the view boundary and is displayed directly.
The same UI works correctly across platforms such as Unity Editor, iOS, and macOS, but the issue only occurs in the shared space of Vision Pro.
Reproduction steps:
Create a ScrollView in Unity.
Add a Mask or RectMask2D to the viewport.
Deploy the application to Apple Vision Pro and run it in Shared Space mode.
Sliding content will not be clipped by the mask, and the masked area is entirely ineffective.
Expected behavior:
The content of ScrollView should be properly clipped by Mask / RectMask2D and should not render outside the mask boundary.
Actual results:
In the shared space of Vision Pro, the mask is ineffective, causing scrolling content to extend beyond the designated area and resulting in severe UI distortion.
Environmental Information:
Device: Apple Vision Pro
Mode: Shared Space
Unity Version: 6000.0.40f1
visionOS version: visionOS 26.0
Unity PolySpatial Version: 2.0.4
Impact
This issue causes Unity UI to fail to display correctly on Vision Pro, preventing ScrollView from properly clipping content, which impacts the UI experience and interaction effects in practical applications.
Expected Result: When running a Unity app in the shared space of visionOS, the Mask / RectMask2D of ScrollView functions correctly
Discuss spatial computing on Apple platforms and how to design and build an entirely new universe of apps and games for Apple Vision Pro.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi team,
I believe I’ve found a registration issue between ARFrame.sceneDepth and ARFrame.capturedImage when using high-resolution frame capture on a 2022 iPad Pro (6th gen).
When enabling high-resolution capture:
if let highResFormat = ARWorldTrackingConfiguration.recommendedVideoFormatForHighResolutionFrameCapturing {
config.videoFormat = highResFormat
}
…
arView.session.captureHighResolutionFrame { ... }
the depth map provided by ARFrame.sceneDepth no longer aligns correctly with the corresponding high-resolution capturedImage.
This misalignment results in consistently over-estimated distance measurements in my app (which relies on mapping depth to 2D pixel coordinates).
iPad Pro (6th gen): misalignment occurs only when capturing high-resolution frames.
iPhone 16 Pro: depth is correctly registered for both standard and high-resolution captures.
It appears the camera intrinsics, specifically the FOV, change between the “regular” resolution stream and the high-resolution capture on the iPad. My suspicion is that the depth data continues using the intrinsics of the lower resolution stream, resulting in an unregistered depth-to-RGB mapping.
Once I have the iPad in hand again, I will confirm whether camera.intrinsics or FOV differ between the low-res and high-res frames.
Is this a known issue with high-resolution frame capture on the 2022 iPad Pro? If not, I’m happy to provide some more thorough sample code.
Thanks for your time!
I have an iOS app that can display a USDZ model downloaded from the Internet (and cached locally) via an ARView.
I would like to light that model with an image based light (IBL) also downloaded from the Internet.
However, as far as I can tell, ARView can only create an IBL from a resource that has been compiled into the Xcode project and loaded with EnvironmentResource(named:in:) or EnvironmentResource.load(named:in:).
Is there a way to create an EnvironmentResource from an HDRI via a file URL to use in ARView in iOS?
How can I create 180-degree apple immersive videos using game engine
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
Hi Apple Developer Community,
I'm developing an eye-tracking application using ARKit's ARFaceTrackingConfiguration and ARFaceAnchor.blendShapes for gaze detection using Xcode. I'm experiencing several calibration and accuracy issues and would appreciate insights from the community.
Current Implementation
Using ARFaceAnchor.blendShapes (.eyeLookUpLeft, .eyeLookDownLeft, .eyeLookInLeft, .eyeLookOutLeft, etc.)
Implementing custom sensitivity curves and smoothing algorithms
Applying baseline correction and coordinate mapping
Using quadratic regression for calibration point mapping
Issues I'm Facing
1. Calibration Mismatch
Red dot position doesn't align with where I'm actually looking
Significant offset between intended gaze point and actual cursor position
Calibration seems to drift or become inaccurate over time
2. Extreme Eye Movement Requirements
Need to make exaggerated eye movements to reach screen edges/corners
Natural eye movements don't translate to proportional cursor movement
Difficulty reaching certain screen regions even with calibration
3. Sensitivity and Stability Issues
Cursor jitters or jumps around when looking at center
Too much sensitivity to micro-movements
Inconsistent behavior between calibration and normal operation
4. I also noticed that tracking on calibration screen as well as tracking on reading screen works better as expected when head movement is there, but I do not want much head movement. I want tracking with normal eye movement while reading an Ebook.
Primary Question: Word-Level Eye Tracking Feasibility
Is word-level eye tracking (tracking gaze as users read through individual words in an ebook) technically feasible with current iPhone/iPad hardware?
I understand that Apple's built-in eye tracking is primarily an accessibility feature for UI navigation. However, I'm wondering if the TrueDepth camera and ARKit's eye tracking capabilities are sufficient for:
Tracking natural reading patterns (left-to-right, line-by-line progression)
Detecting which specific words a user is looking at
Maintaining accuracy for sustained reading sessions (15-30 minutes)
Working reliably across different users and lighting conditions
Questions for the Community
Hardware Limitations: Are iPhone/iPad TrueDepth cameras capable of the precision needed for word-level tracking, or is this beyond current hardware capabilities?
Calibration Best Practices: What calibration strategies have worked best for accurate gaze mapping? How many calibration points are typically needed?
Reading-Specific Challenges: Are there particular challenges when tracking reading behavior vs. general gaze tracking?
Alternative Approaches: Are there better approaches than ARKit blend shapes for this use case?
Current Setup
Devices: iPhone 14 Pro
iOS Version: iOS 18.3
ARKit Version: Latest available
Any insights, experiences, or technical guidance would be greatly appreciated. I'm particularly interested in hearing from developers who have worked on similar eye tracking applications or have experience with the limitations and capabilities of ARKit's eye tracking features.
Thank you for your time and expertise!
When I run my app from Xcode on a device running iOS 26, the roomplan capture is corrupted and the recording is green and purple. This issue does not occur when I use an older version of iOS or when I run the app via testFlight or the App Store.
I'm placing sphere at finger tip and updating its position as hand move.
Finger joint tracking functions correctly, but I’ve observed noticeable latency in hand tracking updates whenever a UITextView becomes active. This lag happens intermittently during app usage, lasting about 5–10 seconds, after which the latency disappears and the sphere starts following the finger joints immediately.
When I open the immersive space for the first time, the profiler shows a large performance spike upto 328%. After that, it stabilizes and runs smoothly.
Note: I don’t observe any lag when CPU usage spikes to 300% (upon immersive view load)
yet the lag still occurs even when CPU usage remains below 100%.
I’m using the following code for hand tracking:
private func processHandTrackingUpdates() async {
for await update in handTracking.anchorUpdates {
let handAnchor = update.anchor
if handAnchor.isTracked {
switch handAnchor.chirality {
case .left:
leftHandAnchor = handAnchor
updateHandJoints(for: handAnchor, with: leftHandJointEntities)
case .right:
rightHandAnchor = handAnchor
updateHandJoints(for: handAnchor, with: rightHandJointEntities)
}
} else {
switch handAnchor.chirality {
case .left:
leftHandAnchor = nil
hideAllJoints(in: leftHandJointEntities)
case .right:
rightHandAnchor = nil
hideAllJoints(in: rightHandJointEntities)
}
}
await MainActor.run {
handTrackingData.processNewHandAnchors(
leftHand: self.leftHandAnchor,
rightHand: self.rightHandAnchor
)
}
}
}
And here’s the function I’m using to update the joint positions:
private func updateHandJoints(
for handAnchor: HandAnchor,
with jointEntities: [HandSkeleton.JointName: Entity]
) {
guard handAnchor.isTracked else {
hideAllJoints(in: jointEntities)
return
}
// Check if the little finger tip and intermediate base are both tracked.
if let tipJoint = handAnchor.handSkeleton?.joint(.littleFingerTip),
let intermediateBaseJoint = handAnchor.handSkeleton?.joint(.littleFingerIntermediateTip),
tipJoint.isTracked,
intermediateBaseJoint.isTracked,
let pinkySphere = jointEntities[.littleFingerTip] {
// Convert joint transforms to world space.
let tipTransform = handAnchor.originFromAnchorTransform * tipJoint.anchorFromJointTransform
let intermediateBaseTransform = handAnchor.originFromAnchorTransform * intermediateBaseJoint.anchorFromJointTransform
// Extract positions from the transforms.
let tipPosition = SIMD3<Float>(tipTransform.columns.3.x,
tipTransform.columns.3.y,
tipTransform.columns.3.z)
let intermediateBasePosition = SIMD3<Float>(intermediateBaseTransform.columns.3.x,
intermediateBaseTransform.columns.3.y,
intermediateBaseTransform.columns.3.z)
// Calculate the midpoint.
let midpointPosition = (tipPosition + intermediateBasePosition) / 2.0
// Position the sphere at the midpoint and make it visible.
pinkySphere.isEnabled = true
pinkySphere.transform.translation = midpointPosition
} else {
// If either joint is not tracked, hide the sphere.
jointEntities[.littleFingerTip]?.isEnabled = false
}
// Update the positions of all other hand joint spheres.
for (jointName, entity) in jointEntities {
if jointName == .littleFingerTip {
// Already handled the pinky above.
continue
}
guard let joint = handAnchor.handSkeleton?.joint(jointName),
joint.isTracked else {
entity.isEnabled = false
continue
}
entity.isEnabled = true
let jointTransform = handAnchor.originFromAnchorTransform * joint.anchorFromJointTransform
entity.transform.translation = SIMD3<Float>(jointTransform.columns.3.x,
jointTransform.columns.3.y,
jointTransform.columns.3.z)
}
}
I’ve attached both a profiler trace and a video recording from Vision Pro that clearly demonstrate the issue.
Profiler: https://drive.google.com/file/d/1fDWyGj_fgxud2ngkGH_IVmuH_kO-z0XZ
Vision Pro Recordings:
https://drive.google.com/file/d/17qo3U9ivwYBsbaSm26fjaOokkJApbkz-
https://drive.google.com/file/d/1LxTxgudMvWDhOqKVuhc3QaHfY_1x8iA0
Has anyone else experienced this behavior? My thought is that there might be some background calculations happening at the OS level causing this latency. Any guidance would be greatly appreciated.
Thanks!
Can we constrain or clamp translation with the new ManipulationComponent? For example, allow free movement within certain bounds.
I would like to know if there have been any updates that improved the image tracking speed. I submitted feedback a while ago, mentioning that the image tracking rate was only one frame per second on Vision Pro.
Thank you very much. FB15688516 and FB16745373is feedback we previously submitted, but we have not received any response yet.
Entity.animate() makes entity animation much easier, but in many cases, I want to break the progress because of some gestures, I couldn't find any way to do this, including tried entity.stopAllAnimations(), I have to wait till Entity.animate() completes.
iOS 26 / visionOS 26
Hello,
I am currently working on a Unity project for the Apple Vision Pro. I would like to have people passing in front of the virtual objects occlude the virtual objects that are behind. Something similar to this: https://developer.apple.com/documentation/arkit/occluding-virtual-content-with-people
I could unfortunately not find any documentation about this. Is it possible to implement body segmentation or occlusion on the Apple Vision Pro? If it's not currently supported, are there plans to add it? Any ideas on how to achieve this with existing tools?
Thanks!
Mehdi
Is there a suitable
UTType type to satisfy the need to pick up only SpatialVideo in UIDocumentPickerViewController?
I already know that PHPickerFilter in PHPickerViewController can do this, but not in UIDocumentPickerViewController.
Our app needs to adapt both of these ways to pick spatial videos
So is there anything that I can try in UIDocumentPickerViewController to fulfill such picker functionality?
Topic:
Spatial Computing
SubTopic:
General
Tags:
Files and Storage
Photos and Imaging
PhotoKit
visionOS
iOS currently restricts background Bluetooth advertising and scanning in order to preserve battery life and protect user privacy. While these restrictions serve important purposes, they also limit legitimate use cases where users have explicitly opted in to proximity-based experiences.
The core challenge is that modern social applications need a way to detect when users are physically present at the same location or event without requiring every participant to keep their app in the foreground. Under the current system, background BLE advertising is heavily throttled and can only transmit a limited payload, background scanning intervals are sparse and unpredictable, peer-to-peer proximity detection cannot be maintained reliably when apps are in the background, and Background App Refresh is non-deterministic, making any kind of time-based proximity validation impossible.
A proposed enhancement would be to introduce an “Enhanced Proximity Permission.” This would allow developers to enable reliable background BLE advertising and scanning for declared time windows, such as a maximum of eight hours. It would also allow devices running the same app to detect each other’s proximity using ephemeral, rotating identifiers that preserve privacy, with clear user consent and prominent indicators whenever the feature is active.
Unlocking this capability would open up new categories of applications. Live events could offer automatic attendance tracking at concerts, conferences, or sports venues. Retail environments could support opt-in foot traffic analysis and dwell-time insights. Social apps could allow users to find friends at festivals, campuses, or other large venues. Safety applications could extend to crowd density monitoring and contact tracing beyond COVID-era needs. Gaming could offer real-world multiplayer experiences based on physical proximity, and transportation providers could verify rideshare pickups or measure public transit flows automatically.
Privacy safeguards would remain central. Permissions would be time-boxed and expire after an event or session. A mandatory visual indicator would be displayed whenever proximity tracking is active. A user-facing dashboard would show all apps granted enhanced proximity access. Permissions would automatically be revoked after a period of non-use, and only ephemeral tokens not permanent identifiers would be broadcast.
The industry impact would be significant. With this enhancement, iOS could power the next generation of location-aware social platforms while maintaining Apple’s leadership in privacy through explicit user control and transparency. Current alternatives, such as requiring users to keep apps in the foreground or deploying dedicated hardware beacons, produce poor user experiences and constrain innovation in spatial computing and social applications.
Can anyone from Apple consider this change? Having to buy iBeacons is brutal and means slower adoption. Please reconsider this for users who opt in.
I am wondering, is it possible to somehow configure a 3D object to respond to the gaze of a person, like change colors of some parts of the 3D-Model where a person is looking, i.e. where a person's gaze lands on the surface of the 3D-model ?
For example, if there is a 3D model of a Cool Dragon 🐉 in the physical space of a person, when seen through with the mixed reality view, of a VisionPro. Now, it would be really cool to change only the color, or make some specific parts of the dragon skin shimmer, but only in the areas where a person is looking. Is this possible ? Is it do-able with eye-tracking of VisionPro ?
Any advice would be appreciated. 🤝 🤓 I am new to iOS and VisionOS development.
Topic:
Spatial Computing
SubTopic:
General
I have been trying to implement this look where a component looks "pushed in" but I could not find any resources regarding this effect. The closest I got was a combination of a RoundedRectangle and .glassBackgroundEffect(), but this makes the view look pushed out, instead of pushed in.
I was wondering if this is achievable in SwiftUI level, or even in UIKit level.
For visionOS 2.0+, it has been announced the object tracking feature. Is there any support for PolySpatial in Unity or is it only available in Swift and Xcode?
Can you help to write a code able to pick an element a bit far from me, then bring it near to me, flick it a bit and then send it back to its original position when I release it?
Thanks a lot,
Christophe
Hello,
we have a RealityKit app that also runs on macOS via Catalyst.
For specific USD assets containing particle systems we have observed a reproducible crash.
Steps to reproduce:
Open Reality Composer Pro
Create new file
Create simple particle system (default one is fine)
export as USDZ
Create project in Xcode
Call Entity.load(… and pass in your USD
Running this on an Intel iMac with macOS Sequoia 15.3 will lead to a crash with the following console log:
validateWithDevice:4704: failed assertion `Render Pipeline DescrvalidateWithDevice:4704: failed assertion `Render Pipeline Descriptor Validation
depthAttachmentPixelFormat (MTLPixelFormatDepthvalidateWithDevice:4704: failed assertion `Render Pipeline Descriptor Validation
depthAttachmentPixelFormat (MTLPixelFormatDepthvalidateWithDevice:4704: failed assertion `Render Pipeline Descriptor Validation
depthAttachmentPixelFormat (MTLPixelFormatDepth32Float) and stencilAttachmentPixelFormat (MTLPixelFormatStencil8) must match.
'
iptor Validation
depthAttachmentPixelFormat (MTLPixelFormatDepth32Float) and stencilAttachmentPixelFormat (MTLPixelFormatStencil8) must match.
'
32Float) and stencilAttachmentPixelFormat (MTLPixelFormatStencil32Float) and stencilAttachmentPixelFormat (MTLPixelFormatStencil8) must match.
'
8) must match.
'
Xcode version: 16.2.0
iMac 2020 3,8GHz Intel Core i7
macOS Sequoia 15.3
FB16477373
It would be great if this could be fixed quickly or a workaround provided since it affects or production app. Thank you!
I want to record animation with entity, then export it to .usd without using Reality Composer Pro, how to achieve that?
I developed an app on VisionPro and created a button that allows users to exit the app instead of being forced to exit. I use the ”exit (0)“ scheme to exit the application, but when I re-enter, the loaded window is not the initial window, so is there any relevant code that can be used? Thank you