Metal

CAMetalDisplayLink in metal-cpp

The title is self-exploratory. I wasn't able to find the CAMetalDisplayLink on the most recent metal-cpp release (metal-cpp_macOS15_iOS18-beta). Are there any plans to include it in the next release?

Graphics & Games Metal Metal metal-cpp

1

1k

Feb ’25

In Metal compute kernels, when do thread variables get spilled into the device memory?

How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself? Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space. To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory? I searched in the Metal feature set tables, but I could not find any details.

Graphics & Games Metal Metal iOS

1

0

626

Mar ’25

Xcode Metal geometry inspector uses wrong NDC space?

When inspecting the geometry in Xcode's metal debugger, I noticed that the shown "frustrum box" didn't make sense. Since Metal uses depth range 0,1 in NDC space, I would expect a vertex that is projected to z:0 to be on the front clipping plane of the frustrum shown in the geometry inspector. This is however not the case. A vertex with ndc z:0 is shown halfway inside the frustrum. Vertices with ndc z less than 0 are correctly culled during rendering, while the geometry inspector's frustrum shows that the vertex is stil inside the frustrum. The image shows vertices that are visually in the middle of the frustrum on z axis, but at the same time the out position shows that they are projected to z:0. How is this possible, unless there's a bug in the geometry inspector?

Graphics & Games Metal Xcode Graphical Debugger

1

0

586

Feb ’25

Why is depth/stencil buffer loaded/stored twice in xcode gpu capture?

I used xcode gpu capture to profile render pipeline's bandwidth of my game.Then i found depth buffer and stencil buffer use the same buffer whitch it's format is Depth32Float_Stencil8. But why in a single pass of pipeline, this buffer was loaded twice, and the Load Attachment Size of Encoder Statistics was double. Is there any bug with xcode gpu capture?Or the pass really loaded the buffer twice times?

Graphics & Games Metal

1

0

369

Mar ’25

Implementing Scalable Order-Independent Transparency (OIT) in Metal

Hi, Apple’s documentation on Order-Independent Transparency (OIT) describes an approach using image blocks, where an array of size 4 is allocated per fragment to store depth and color in a tile shading compute pass. However, when increasing the scene’s depth complexity by adding more overlapping quads, the OIT implementation fails due to the fixed array size. Is there a way to dynamically allocate storage for fragments based on actual depth complexity encountered during rasterization, rather than using a fixed-size array? Specifically, can an adaptive array of fragments be maintained and sorted by depth, where the size grows as needed instead of being limited to 4 entries? Any insights or alternative approaches would be greatly appreciated. Thank you!

Graphics & Games Metal Metal MetalKit Core Graphics

1

0

570

Mar ’25

Is Metal usable from Swift 6?

Hello ladies and gentlemen, I'm writing a simple renderer on the main actor using Metal and Swift 6. I am at the stage now where I want to create a render pipeline state using asynchronous API: @MainActor class Renderer { let opaqueMeshRPS: MTLRenderPipelineState init(/*...*/) async throws { let descriptor = MTLRenderPipelineDescriptor() // ... opaqueMeshRPS = try await device.makeRenderPipelineState(descriptor: descriptor) } } I get a compilation error if try to use the asynchronous version of the makeRenderPipelineState method: Non-sendable type 'any MTLRenderPipelineState' returned by implicitly asynchronous call to nonisolated function cannot cross actor boundary Which is understandable, since MTLRenderPipelineState is not Sendable. But it looks like no matter where or how I try to access this method, I just can't do it - you have this API, but you can't use it, you can only use the synchronous versions. Am I missing something or is Metal just not usable with Swift 6 right now?

Graphics & Games Metal Metal Swift Concurrency

1

0

636

Mar ’25

3D Skeletal animation in metal-cpp?

Hey all! I'm got my hands on a refurbished mac mini m1 and already diving into metal. At the moment, i'm currently studying graphics programming with opengl and got to a point where I can almost create a 3d cube. However, I noticed there aren't many tutorials for metal cpp but rather demos. One thing I love about graphic programming, is skinning/skeletal animation. At the moment, I can't find any sources or tutorials on how to load skeletal animations into metal-cpp. So, if I create my character in blender and had all types of animations all loaded into a .FBX or maybe .DAE and load this into metal api with metal-cpp, how can I go on about how this works?

Graphics & Games Metal MetalKit metal-cpp MetalFX

1

0

479

Mar ’25

why GLDContextRec::flushContextInternal() leads to abort

The flushContextInternal function in glr_sync.mm:262 called abort internally. What caused this? Was it due to high device temperature or some other reason? Date/Time: 2024-08-29 09:20:09.3102 +0800 Launch Time: 2024-08-29 08:53:11.3878 +0800 OS Version: iPhone OS 16.7.10 (20H350) Release Type: User Baseband Version: 8.50.04 Report Version: 104 Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Triggered by Thread: 0 Thread 0 name: Thread 0 Crashed: 0 libsystem_kernel.dylib 0x00000001ed053198 __pthread_kill + 8 (:-1) 1 libsystem_pthread.dylib 0x00000001fc5e25f8 pthread_kill + 208 (pthread.c:1670) 2 libsystem_c.dylib 0x00000001b869c4b8 abort + 124 (abort.c:118) 3 AppleMetalGLRenderer 0x00000002349f574c GLDContextRec::flushContextInternal() + 700 (glr_sync.mm:262) 4 DiSpecialDriver 0x000000010824b07c Di::RHI::onRenderFrameEnd() + 184 (RHIDevice.cpp:118) 5 DiSpecialDriver 0x00000001081b85f8 Di::Client::drawFrame() + 120 (Client.cpp:155) 2024-08-27_14-44-10.8104_+0800-07d9de9207ce4c73289507e608e5de4320d02ccf.crash

Graphics & Games Metal

1

0

133

Mar ’25

How can I get pixel coordinates in the fragment tile function?

In this video, tile fragment shading is recommended for image processing. In this example, the unpack function takes two arguments, one of which is RasterizerData. As I understand it, this is the data passed to us from the previous stage (Vertex) of the graphics pipeline. However, the properties of MTLTileRenderPipelineDescriptor do not include an option for specifying a Vertex function. Therefore, in this render pass, a mix of commands is used: first, a draw command is executed to obtain UV coordinates, and then threads are dispatched. My question is: without using a draw command, only dispatch, how can I get pixel coordinates in the fragment tile function? For the kernel tile function, everything is clear. typedef struct { float4 OPTexture [[ color(0) ]]; float4 IntermediateTex [[ color(1) ]]; } FragmentIO; fragment FragmentIO Unpack(RasterizerData in [[ stage_in ]], texture2d<float, access::sample> srcImageTexture [[texture(0)]]) { FragmentIO out; //... // Run necessary per-pixel operations out.OPTexture = // assign computed value; out.IntermediateTex = // assign computed value; return out; }

Graphics & Games Metal Metal

1

0

181

Mar ’25

Metal triangle strips uniform opacity.

I have this drawing app that I have been working on for the past few years when I have free time. I recently rebuilt the app in Metal to build out other brushes and improve performance, need to render 10000s of lines in realtime. I’m running into this issue trying to create a uniform opacity per path. I have a solution but do not love it - as this is a realtime app and the solution could have some bottlenecks. If I just generate a triangle strip from touch points and do my best to smooth, resample, and handle miters I will always get some overlaps. See: To create a uniform opacity I render to an offscreen texture with blending disabled. I then pre-multiply the color and draw that texture to a composite texture with blending on (I do this per path). This works but gets tricky when you introduce a textured brush, the edges of the texture in the frag shader cut out the line. Pasted Graphic 1.png Solution: I discard below a threshold fragment float4 fragment_line(VertexOut in [[stage_in]], texture2d<float> texture [[ texture(0) ]]) { constexpr sampler s(coord::normalized, address::mirrored_repeat, filter::linear); float2 texCoord = in.texCoord; float4 texColor = texture.sample(s, texCoord); if (texColor.a < 0.01) discard_fragment(); // may be slow (from what I read) return in.color * texColor; } Better but still not perfect. Question: I'm looking for better ways to create a uniform opacity per path. I tried .max blending but that will cause no blending of other paths. Any tips, ideas, much appreciated. If this is too detailed of a question just achieve.

Graphics & Games Metal Metal UIKit

1

0

105

Mar ’25

Cannot download Deferred Lighting in C++ Sample Code

Anyone else unable to download the "Rendering a Scene with Deferred Lighting in C++" (https://developer.apple.com/documentation/metal/rendering-a-scene-with-deferred-lighting-in-c++?language=objc)? I just an error page: Is there another place to download this sample?

Graphics & Games Metal

1

0

104

Mar ’25

Threadgroup memory for fragment shader

Hello I am trying to get thread group memory access in fragment shader. In essence, I would like to have all the fragments in a tile to bitwiseOR some value. My idea was to use simd_or across the SIMD group, then make each SIMD group thread 0 to atomic or the value into thread group memory. Finally very first thread of the tile would be tasked with writing the value down to texture with write access. Now, I can allocate the thread group memory argument to the fragment function all right. MTLRenderEncoder has setThreadgroupMemoryLength call, which I am using the following way [renderEncoder setThreagroupMemoryLength: 16 offset: 0 atIndex:0] Unfortunately, all I am getting is the following error (runtime assertion) -[MTLDebugRenderCommandEncoder setThreadgroupMemoryLength:offset:atIndex:]:3487: failed assertion Set Threadgroup Memory Length Validation offset + length(16) must be <= threadgroupMemoryLength(0).` What I am doing wrong? How I can get thread group memory in the fragment shader? I know I could use tile shading and compute function but the problem is that here I really like to use fragment stuff. Will be grateful for help.

Graphics & Games Metal Metal

1

0

112

Apr ’25

Support for clock() shader instruction in MSL similar to VK_KHR_shader_clock instructions

Hi, seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example.. useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU.. also useful for MoltenVK to support that extensions.. thanks..

Graphics & Games Metal Graphics and Games Metal

1

0

150

Apr ’25

CMake unable to generate the Xcode file described in this tutorial

In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command: cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ I keep getting an error: CMake Error at CMakeLists.txt:5 (include): include could not find requested file: /Users/macuser/USDInstall/bin/pxrConfig.cmake I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?

Graphics & Games Metal Metal MetalKit USDZ OpenGL

1

0

164

May ’25

Sparse Texture Writes

Hey, I've been struggling with this for some days now. I am trying to write to a sparse texture in a compute shader. I'm performing the following steps: Set up a sparse heap and create a texture from it Map the whole area of the sparse texture using updateTextureMapping(..) Overwrite every value with the value "4" in a compute shader Blit the texture to a shared buffer Assert that the values in the buffer are "4". I have a minimal example (which is still pretty long unfortunately). It works perfectly when removing the line heapDesc.type = .sparse. What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated. import Metal func sparseTexture64x64Demo() throws { // ── Metal objects guard let device = MTLCreateSystemDefaultDevice() else { throw NSError(domain: "SparseNotSupported", code: -1) } let queue = device.makeCommandQueue()! let lib = device.makeDefaultLibrary()! let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!) // ── Texture descriptor let width = 64, height = 64 let format: MTLPixelFormat = .r32Uint // 4 B per texel let desc = MTLTextureDescriptor() desc.textureType = .type2D desc.pixelFormat = format desc.width = width desc.height = height desc.storageMode = .private desc.usage = [.shaderWrite, .shaderRead] // ── Sparse heap let bytesPerTile = device.sparseTileSizeInBytes let meta = device.heapTextureSizeAndAlign(descriptor: desc) let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile let heapDesc = MTLHeapDescriptor() heapDesc.type = .sparse heapDesc.storageMode = .private heapDesc.size = heapBytes let heap = device.makeHeap(descriptor: heapDesc)! let tex = heap.makeTexture(descriptor: desc)! // ── CPU buffers let bytesPerPixel = MemoryLayout<UInt32>.stride let rowStride = width * bytesPerPixel let totalBytes = rowStride * height let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)! let cb = queue.makeCommandBuffer()! let fence = device.makeFence()! // 2. Map the sparse tile, then signal the fence let rse = cb.makeResourceStateCommandEncoder()! rse.updateTextureMapping( tex, mode: .map, region: MTLRegionMake2D(0, 0, width, height), mipLevel: 0, slice: 0) rse.update(fence) // ← capture all work so far rse.endEncoding() let ce = cb.makeComputeCommandEncoder()! ce.waitForFence(fence) ce.setComputePipelineState(pipeline) ce.setTexture(tex, index: 0) let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1) let tgCount = MTLSize(width: (width + 7) / 8, height: (height + 7) / 8, depth: 1) ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG) ce.updateFence(fence) ce.endEncoding() // Blit texture into shared buffer let blit = cb.makeBlitCommandEncoder()! blit.waitForFence(fence) blit.copy( from: tex, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), sourceSize: MTLSize(width: width, height: height, depth: 1), to: dstBuf, destinationOffset: 0, destinationBytesPerRow: rowStride, destinationBytesPerImage: totalBytes) blit.endEncoding() cb.commit() cb.waitUntilCompleted() assert(cb.error == nil, "GPU error: \(String(describing: cb.error))") // ── Verify a few texels let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height) print("first three texels:", out[0], out[1], out[width]) // 0 1 64 assert(out[0] == 4 && out[1] == 4 && out[width] == 4) } Metal shader: #include <metal_stdlib> using namespace metal; kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]], uint2 gid [[thread_position_in_grid]]) { tex.write(4, gid); }

Graphics & Games Metal Metal

1

0

130

May ’25

Distortion Artifacts on VisionOS When Rendering Opaque/Alpha Clipped Foliage in URP (Unity 6.0, Metal)

I'm running into a persistent visual issue while deploying a floral corridor scene to Apple Vision Pro using Unity 6.0 with URP and Metal. The issue only appears on the Vision Pro device — everything looks fine in the Unity Editor. Issue Description When the frame rate drops to around 60–70 FPS, noticeable distortion artifacts appear around the edges of foliage models. It seems like the background meshes (behind the plants) get warped and leak through the edges of the foliage. Although this is most visible around the leaves, even solid objects like standard URP wall or box models show distorted edges when the issue occurs. All the foliage uses Opaque or Alpha Clipping materials. Things I've Tried Changing the foliage materials to Transparent mode —distortion around edges disappears, but using Transparent for a large number of foliage assets is not ideal for performance or sorting complexity. Reducing the number of foliage objects — with only a few plants in the scene and the frame rate staying around 100 FPS, the distortion disappears. However, this isn’t a practical solution for a full environment. Possible Cause? I came across this note in the Unity documentation: "Ensure depth-buffer for each pixel is non-zero - on visionOS, the depth buffer is used for reprojection. To ensure visual effects like skyboxes and shaders are displayed beautifully, ensure that some value is written to the depth for each pixel." Could this be related to the issue? Is it possible that Alpha Clipping with low pixel coverage leads to some pixels not writing to the depth buffer, which then causes problems during Vision Pro’s reprojection or foveated rendering? However, even when I disable Alpha Clipping entirely, the distortion issue still persists, so it may not be solely caused by clipping itself. Project Setup Unity 6.0 (URP) Depth Texture: Enable Using Metal as the graphics backend Running on real Vision Pro hardware (not simulator) Any advice on how to avoid these distortion issues on Vision Pro would be greatly appreciated. Thanks!

Graphics & Games Metal visionOS

1

0

132

Jul ’25

打开显示HUD图形后，应用崩溃

hi everyone, 我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口，在进行性能测试时，打开了设置-开发者-显示HUD图形。运行应用后，正常展示HUD，但应用很快发生了崩溃，日志主要信息如下： Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd Hardware Model: iPhone14,3 Process: waimai [16584] Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai Identifier: Version: 61488 (8.53.0) Code Type: ARM-64 Parent Process: ? [1] Date/Time: 2025-06-12 14:41:45.296 +0800 OS Version: iOS 18.0 (22A3354) Report Version: 104 Monitor Type: Mach Exception Exception Type: EXC_BAD_ACCESS (SIGBUS) Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00 Crashed Thread: 57 Thread 57 Crashed: 0 libMTLHud.dylib esfm_GenerateTriangesForString + 408 1 libMTLHud.dylib esfm_GenerateTriangesForString + 92 2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204 3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656 4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72 5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56 6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104 7 Metal MTLDispatchListApply + 52 8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312 9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136 10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64 11 libdispatch.dylib _dispatch_client_callout4 + 20 12 libdispatch.dylib _dispatch_mach_msg_invoke + 464 13 libdispatch.dylib _dispatch_lane_serial_drain + 368 14 libdispatch.dylib _dispatch_mach_invoke + 456 15 libdispatch.dylib _dispatch_lane_serial_drain + 368 16 libdispatch.dylib _dispatch_lane_invoke + 432 17 libdispatch.dylib _dispatch_lane_serial_drain + 368 18 libdispatch.dylib _dispatch_lane_invoke + 380 19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288 20 libdispatch.dylib _dispatch_workloop_worker_thread + 540 21 libsystem_pthread.dylib _pthread_wqthread + 288 我们测试了几个不同的机型，只有iPhone 13 Pro Max会发生崩溃。 Q1：为什么会发生这个崩溃？ Q2：相同的逻辑，为什么仅在iPhone 13 Pro Max机型上出现崩溃？期待您的解答。

Graphics & Games Metal Metal MetalKit

1

0

231

Jul ’25

Show MetalFX scaling in Metal HUD iOS

Hi there, I'm wondering if it's possible under iOS 28 developer beta to enable MetalFX scaling info with '{"MTL_HUD_ENABLED": "1" for my App. This information has been added to Mac, but looks to be absent on iPhone / iPad

Graphics & Games Metal Metal MetalFX

1

0

170

Jun ’25

Combining render encoders

When I take a frame capture of my application in Xcode, it shows a warning that reads "Your application created separate command encoders which can be combined into a single encoder. By combining these encoders you may reduce your application's load/store bandwidth usage." In the minimal reproduction case I've identified for this warning, I have two render pipeline states: The first writes to the current drawable, the depth buffer, and a secondary color buffer. The second writes only to the current drawable. Because these are writing to a different set of outputs, I was initially creating two separate render command encoders to handle the draws under each of these states. My understanding is that Xcode is telling me I could only create one, however when I try to do that, I get runtime asserts when attempting to apply the second render pipeline state since it doesn't have a matching attachment configured for the second color buffer or for the depth buffer, so I can't just combine the encoders. Is the only solution here to detect and propagate forward the color/depth attachments from the first state into the creation of the second state? Is there any way to suppress this specific warning in Xcode?

Graphics & Games Metal

1

0

307

Jul ’25

Compute kernel fails to compile when calling texture.read()

If I compile a compute kernel with a call to texture.read(), it fails with the following error: "Error Domain=AGXMetalG13X Code=3 "Encountered unlowered function call to air.get_read_sampler" UserInfo={NSLocalizedDescription=Encountered unlowered function call to air.get_read_sampler}." This error occurs on both macOS and iOS 26 Beta 5, but not when running on a simulator or in a playground. It does not occur on a macOS Sequoia VM. It occurs whether I use the old metal 3 or new metal 4 compilation method. A workaround would be to use a sampler, but according to the feature tables, all platforms support reading from textures of all formats. Below is a minimal example which produces the error: let device = MTLCreateSystemDefaultDevice()! let library = device.makeDefaultLibrary()! let computeFunction = library.makeFunction(name: "compute_test")! do { let pipeline = try device.makeComputePipelineState(function: computeFunction) debugPrint(pipeline) } catch { debugPrint("Metal 3 failed with error:\n\(error)") } #import <metal_stdlib> using namespace metal; kernel void compute_test(uint2 gid [[thread_position_in_grid]], texture2d<float, access::read> in [[texture(0)]], texture2d<float, access::write> out [[texture(1)]]) { out.write(in.read(gid), gid); } I filed feedback FB19530049.

Graphics & Games Metal Metal Beta

1

0

212

Aug ’25

Post

Replies

Boosts

Views

Activity