Hi,
I’m testing Unity’s Spaceship HDRP demo on iPhone 17 Pro Max and iPad Pro M4 (iOS 26.1).
Everything renders correctly, and my custom MetalFX Spatial plugin initializes successfully — it briefly reports active scaling (e.g. 1434×660 → 2868×1320 at 50% scaling), then reverts to native rendering a few frames later.
Setup:
Xcode 16.1 (targeting iOS 18)
Unity 2022.3.62f3 (HDRP)
Metal backend
Dynamic Resolution enabled in HDRP assets and cameras
Relevant Xcode console excerpt:
[MetalFXPlugin] MetalFX_Enable(True) called.
[SpaceshipOptions] MetalFX enabled with HDRP dynamic resolution integration.
[SpaceshipOptions] Disabled TAA for MetalFX Spatial.
[SpaceshipOptions] Created runtime RenderTexture: 1434x660
[MetalFX] Spatial scaler created (1434x660 → 2868x1320).
[MetalFX] Processed frame with scaler.
[MetalFXPlugin] Sent RenderTexture (1434x660) to MetalFX. Output target 2868x1320.
[SpaceshipOptions] MetalFX target set: 1434x660
[SpaceshipOptions] Camera targetTexture cleared after MetalFX handoff.
It looks like HDRP clears the camera’s target texture right after MetalFX submits the frame, which causes it to revert to native rendering.
Is there a recommended way to persist or rebind the MetalFX output texture when using HDRP on iOS?
Unity doesn’t appear to support MetalFX in the Editor either:
Thanks!
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
My app has a number of heterogeneous GPU workloads that all run concurrently. Some of these should be executed with the highest priority because the app’s responsiveness depends on them, while others are triggered by file imports and the like which should have a low priority. If this was running on the CPU I’d assign the former User Interactive QoS and the latter Utility QoS. Is there an equivalent to this for GPU work?
Deterministic RNG behaviour across Mac M1 CPU and Metal GPU – BigCrush pass & structural diagnostics
Hello,
I am currently working on a research project under ENINCA Consulting, focused on advanced diagnostic tools for pseudorandom number generators (structural metrics, multi-seed stability, cross-architecture reproducibility, and complementary indicators to TestU01).
To validate this diagnostic framework, I prototyped a small non-linear 64-bit PRNG (not as a goal in itself, but simply as a vehicle to test the methodology).
During these evaluations, I observed something interesting on Apple Silicon (Mac M1):
• bit-exact reproducibility between M1 ARM CPU and M1 Metal GPU,
• full BigCrush pass on both CPU and Metal backends,
• excellent p-values,
• stable behaviour across multiple seeds and runs.
This was not the intended objective, the goal was mainly to validate the diagnostic concepts, but these results raised some questions about deterministic compute behaviour in Metal.
My question: Is there any official guidance on achieving (or expecting) deterministic RNG or compute behaviour across CPU ↔ Metal GPU on Apple Silicon? More specifically:
• Are deterministic compute kernels expected or guaranteed on Metal for scientific workloads?
• Are there recommended patterns or best practices to ensure reproducibility across GPU generations (M1 → M2 → M3 → M4)?
• Are there known Metal features that can introduce non-determinism?
I am not sharing the internal recurrence (this work is proprietary), but I can discuss the high-level diagnostic observations if helpful.
Thank you for any insight, very interested in how the Metal engineering team views deterministic compute patterns on Apple Silicon.
Pascal ENINCA Consulting
Topic:
Graphics & Games
SubTopic:
Metal
Tags:
ML Compute
Metal
Metal Performance Shaders
Apple Silicon
Deterministic RNG behaviour across Mac M1 CPU and Metal GPU – BigCrush pass & structural diagnostics
Hello,
I am currently working on a research project under ENINCA Consulting, focused on advanced diagnostic tools for pseudorandom number generators (structural metrics, multi-seed stability, cross-architecture reproducibility, and complementary indicators to TestU01).
To validate this diagnostic framework, I prototyped a small non-linear 64-bit PRNG (not as a goal in itself, but simply as a vehicle to test the methodology).
During these evaluations, I observed something interesting on Apple Silicon (Mac M1): • bit-exact reproducibility between M1 ARM CPU and M1 Metal GPU, • full BigCrush pass on both CPU and Metal backends, • excellent p-values, • stable behaviour across multiple seeds and runs.
This was not the intended objective, the goal was mainly to validate the diagnostic concepts, but these results raised some questions about deterministic compute behaviour in Metal.
My question: Is there any official guidance on achieving (or expecting) deterministic RNG or compute behaviour across CPU ↔ Metal GPU on Apple Silicon? More specifically:
• Are deterministic compute kernels expected or guaranteed on Metal for scientific workloads?
• Are there recommended patterns or best practices to ensure reproducibility across GPU generations (M1 → M2 → M3 → M4)? • Are there known Metal features that can introduce non-determinism?
I am not sharing the internal recurrence (this work is proprietary), but I can discuss the high-level diagnostic observations if helpful.
Thank you for any insight, very interested in how the Metal engineering team views deterministic compute patterns on Apple Silicon.
Pascal ENINCA Consulting
Topic:
Graphics & Games
SubTopic:
Metal
I am puzzled by the setAddress(_:attributeStride:index:) of MTL4ArgumentTable. Can anyone please explain what the attributeStride parameter is for? The doc says that it is "The stride between attributes in the buffer." but why?
Who uses this for what? On the C++ side in the shaders the stride is determined by the C++ type, as far as I know. What am I missing here?
Thanks!
I have something like this drawing in an MTKView (see at bottom).
I am finding it difficult to figure out when can the Swift-land resources used in making the MTLBuffer(s) be released? Below, for example, is it ok if args goes out of scope (or is otherwise deallocated) at point 1, 2, or 3? Or perhaps even earlier, as soon as argsBuffer has been created?
I have been reading through various articles such as
Setting resource storage modes
Choosing a resource storage mode for Apple GPUs
Copying data to a private resource
but it's a lot to absorb and I haven't been really able to find an authoritative description of the required lifetime of the resources in CPU land.
I should mention that this is Metal 4 code. In previous versions of Metal, the MTLCommandBuffer had the ability to add a completion handler to be called by the GPU after it has finished running the commands in the buffer but in Metal 4 there is no such thing (it it were even needed for the purpose I am interested in).
Any advice and/or pointers to the definitive literature will be appreciated.
guard let argsBuffer = device.makeBuffer(bytes: &args,...
argumentTable.setAddress(argsBuffer.gpuAddress, ...
encoder.setArgumentTable(argumentTable, stages: .vertex)
// encode drawing
renderEncoder.draw...
...
encoder.endEncoding() // 1
commandBuffer.endCommandBuffer() // 2
commandQueue.waitForDrawable(drawable)
commandQueue.commit([commandBuffer]) // 3
commandQueue.signalDrawable(drawable)
drawable.present()
Topic:
Graphics & Games
SubTopic:
Metal
Is the pseudocode below thread-safe? Imagine that the Main thread sets the CAMetalLayer's drawableSize to a new size meanwhile the rendering thread is in the middle of rendering into an existing MTLDrawable which does still have the old size.
Is the change of metalLayer.drawableSize thread-safe in the sense that I can present an old MTLDrawable which has a different resolution than the current value of metalLayer.drawableSize? I assume that setting the drawableSize property informs Metal that the next MTLDrawable offered by the CAMetalLayer should have the new size, right?
Is it valid to assume that "metalLayer.drawableSize = newSize" and "metalLayer.nextDrawable()" are internally synchronized, so it cannot happen that metalLayer.nextDrawable() would produce e.g. a MTLDrawable with the old width but with the new height (or a completely invalid resolution due to potential race conditions)?
func onWindowResized(newSize: CGSize) {
// Called on the Main thread
metalLayer.drawableSize = newSize
}
func onVsync(drawable: MTLDrawable) {
// Called on a background rendering thread
renderer.renderInto(drawable: drawable)
}