Metal

Metal texture allocated size versus actual image data size

Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory. Here is some test code with some measurements i've taken using an iphone 11: for(int i = 0; i < 11; i++) { MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init]; texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR; int dim = 12; int n = 2 << i; int mips = i+1; texDesc.width = n; texDesc.height = n; texDesc.mipmapLevelCount = mips; texDesc.resourceOptions = MTLResourceStorageModeShared; texDesc.usage = MTLTextureUsageShaderRead; // Calculate the equivalent astc texture size int blocks = 0; if(mips == 1) { blocks = n/dim + (n%dim>0? 1 : 0); blocks *= blocks; } else { for(int j = 0; j < mips; j++) { int a = 2 << j; int cur = a/dim + (a%dim>0? 1 : 0); blocks += cur*cur; } } auto tex = [objCObj newTextureWithDescriptor:texDesc]; printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize); } MTLPixelFormatASTC_12x12_LDR 128x128, mips 7, Astc: 2768, Metal: 6016 256x256, mips 8, Astc: 10512, Metal: 32768 512x512, mips 9, Astc: 40096, Metal: 98304 1024x1024, mips 10, Astc: 158432, Metal: 262144 128x128, mips 1, Astc: 1936, Metal: 4096 256x256, mips 1, Astc: 7744, Metal: 16384 512x512, mips 1, Astc: 29584, Metal: 65536 1024x1024, mips 1, Astc: 118336, Metal: 147456 MTLPixelFormatASTC_8x8_LDR 128x128, mips 7, Astc: 5488, Metal: 6016 256x256, mips 8, Astc: 21872, Metal: 32768 512x512, mips 9, Astc: 87408, Metal: 98304 1024x1024, mips 10, Astc: 349552, Metal: 360448 128x128, mips 1, Astc: 4096, Metal: 4096 256x256, mips 1, Astc: 16384, Metal: 16384 512x512, mips 1, Astc: 65536, Metal: 65536 1024x1024, mips 1, Astc: 262144, Metal: 262144 I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers. Is there any way to have metal allocate these textures in a more compact way to save on memory?

Graphics & Games Metal iOS Metal

8

0

2.8k

Feb ’22

Any example about using metal-cpp on iOS?

Now the examples of metal-cpp are target on desktop and using AppKit which is not supported on iOS. Is there any tips for developing with metal-cpp on mobile device?

Graphics & Games Metal metal-cpp wwdc2022-10160

2

0

1.2k

Aug ’22

Metal and Swift Concurrency

Hi, Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool. GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks. My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance? To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work? Thanks!

Graphics & Games Metal Metal Concurrency

5

0

1.1k

Jun ’24

CAMetalDisplayLink in metal-cpp

The title is self-exploratory. I wasn't able to find the CAMetalDisplayLink on the most recent metal-cpp release (metal-cpp_macOS15_iOS18-beta). Are there any plans to include it in the next release?

Graphics & Games Metal Metal metal-cpp

1

1k

Jun ’24

In Metal compute kernels, when do thread variables get spilled into the device memory?

How many 32-bit variables can I use concurrently in a single thread of a Metal compute kernel without worrying about the variables getting spilled into the device memory? Alternatively: how many 32-bit registers does a single thread have available for itself? Let's say that each thread of my compute kernel needs to store and work with its own array of N float variables, where N can be 128, 256, 512 or more. To achieve maximum possible performance, I do not want to the local thread variables to get spilled into the slow device memory. I want all N variables to be stored "on-chip", in the thread memory space. To make my question more concrete, let's say there is an array thread float localArray[N]. Assuming an unrealistic hypothetical scenario where localArray is the only variable in the whole kernel, what is the maximum value of N for which no portion of localArray would get spilled into the device memory? I searched in the Metal feature set tables, but I could not find any details.

Graphics & Games Metal Metal iOS

1

0

626

Aug ’24

Metal runtime shader library compilation and linking issue

In my project I need to do the following: In runtime create metal Dynamic library from source. In runtime create metal Executable library from source and Link it with my previous created Dynamic library. Create compute pipeline using those two libraries created above. But I get the following error at the third step: Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel " UserInfo={NSLocalizedDescription=Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel } import Foundation import Metal class MetalShaderCompiler { let device = MTLCreateSystemDefaultDevice()! var pipeline: MTLComputePipelineState! func compileDylib() -> MTLDynamicLibrary { let source = """ #include <metal_stdlib> using namespace metal; half3 noise() { return half3(1, 0, 1); } """ let option = MTLCompileOptions() option.libraryType = .dynamic option.installName = "@executable_path/libFoundation.metallib" let library = try! device.makeLibrary(source: source, options: option) let dylib = try! device.makeDynamicLibrary(library: library) return dylib } func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary { let source = """ #include <metal_stdlib> using namespace metal; extern half3 noise(); kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]], texture2d<half, access::write> dst [[texture(1)]], ushort2 gid [[thread_position_in_grid]]) { half4 rgba = src.read(gid); rgba.rgb += noise(); dst.write(rgba, gid); } """ let option = MTLCompileOptions() option.libraryType = .executable option.libraries = [dylib] let library = try! self.device.makeLibrary(source: source, options: option) return library } func runtime() { let dylib = self.compileDylib() let exlib = self.compileExlib(dylib: dylib) let pipelineDescriptor = MTLComputePipelineDescriptor() pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel") pipelineDescriptor.preloadedLibraries = [dylib] pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil) } }

Graphics & Games Metal Metal

4

0

933

Sep ’24

Can't link metal-cpp to Modern Rendering With Metal sample

There is a sample project from Apple here. It has a scene of a city at night and you can move in it. It basically has 2 parts: application code written in what looks like Objective-C (I am more familiar with C++), which inherits from things like NSObject, MTKView, NSViewController and so on - it processes input and all app-related and window-related stuff. rendering code that also looks like Objective-C. Btw both parts are mostly in .mm files (Obj-C++ AFAIK). The application part directly uses only one class from the rendering part - AAPLRenderer. I want to move the rendering part to C++ using metal-cpp. For that I need to link metal-cpp to the project. I did it successfully with blank projects several times before using this tutorial. But with this sample project Xcode can't find Foundation/Foundation.hpp (and other metal-cpp headers). The error says this: Did not find header 'Foundation.hpp' in framework 'Foundation' (loaded from '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk/System/Library/Frameworks') Pls help

Graphics & Games Metal metal-cpp

2

0

880

Sep ’24

OS choosing performance state poorly for GPU use case

I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU. However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time. I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away. For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well. I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds! I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state. I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work. Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up? Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.

Graphics & Games Metal Metal Concurrency

6

0

961

Nov ’24

How to use MTKTextureLoader to load png data

I am trying to load some PNG data with MTKTextureLoader newTextureWithData，but the result shows wrong at the alpha area. Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong. UIImage *tempImg = [UIImage imageWithData:data]; CGImageRef cgRef = tempImg.CGImage; MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device]; id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; NSData *tempData = UIImagePNGRepresentation(tempImg); id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; }] resume];

Graphics & Games Metal MetalKit

5

0

666

Dec ’24

Tile Shaders performance when writing to tile texture vs. resolve texture

I am working on a custom resolve tile shader for a client. I see a big difference in performance depending on where we write to: 1- the resolve texture of the color attachment 2- a rw tile shader texture set via [renderEncoder setTileTexture: myResolvedTexture] Option 2 is more than twice as slow than option 1. Our compute shader writes to 4 UAVs so just using the resolve texture entry is not possible. Why such a difference as there is no more data being written? Can option 2 be as fast as option 1? I can demonstrate the issue in a modified version of the Multisample code sample.

Graphics & Games Metal Metal Metal Performance Shaders

5

0

595

Dec ’24

Xcode Metal geometry inspector uses wrong NDC space?

When inspecting the geometry in Xcode's metal debugger, I noticed that the shown "frustrum box" didn't make sense. Since Metal uses depth range 0,1 in NDC space, I would expect a vertex that is projected to z:0 to be on the front clipping plane of the frustrum shown in the geometry inspector. This is however not the case. A vertex with ndc z:0 is shown halfway inside the frustrum. Vertices with ndc z less than 0 are correctly culled during rendering, while the geometry inspector's frustrum shows that the vertex is stil inside the frustrum. The image shows vertices that are visually in the middle of the frustrum on z axis, but at the same time the out position shows that they are projected to z:0. How is this possible, unless there's a bug in the geometry inspector?

Graphics & Games Metal Xcode Graphical Debugger

1

0

586

Feb ’25

Black Screen in GPTK – DX 12.1 / Shader Model 6.5 Issue?

Hey everyone, I’m trying to run Kingdom Come: Deliverance 2 using the Game Porting Toolkit, but I’m encountering a black screen when launching the game. From what I know about the game’s requirements, it might be using Shader Model 6.5, which supports advanced features like DirectX Raytracing (DXR) Tier 1.1. This leads me to suspect that the issue could be related to missing support for DirectX 12.1 Features or Shader Model 6.5 in GPTK. Does anyone know if these features are currently supported by GPTK? If not, are there any plans to implement them in future updates? Alternatively, is there any workaround for games that rely on Shader Model 6.5 and ray tracing? Thanks a lot for your help!

Graphics & Games Metal Game Porting Toolkit

2

0

689

Feb ’25

Metal calls hanging/stuck if app is started quickly after login

Our app uses Metal for image processing. We have found that if our app (and its possible intensive image processing) is started quickly after user is logged in, then calls to Metal may be hanging/stuck for a good while. Example: it can take 1-2 minutes for something that usually takes 3-5 seconds! Metal threads are just hanging in a memmove... In Activity Monitor we see a lot of things are happening right after log-in. But why Metal calls are blocking for so long is unknown to us... The workaround is to wait a minute before we start our app and start intensive image processing using Metal. But hard to explain this workaround to end-users... It doesn't happen on all computers but fairly easy to reproduce on some computers. We are using macOS 15.3.1. M1/M3 Max. Any good ideas for how to proceed with this problem and possible reach out to Apple engineers? Thanks! :)

Graphics & Games Metal Metal Debugging

2

0

475

Feb ’25

Why is depth/stencil buffer loaded/stored twice in xcode gpu capture?

I used xcode gpu capture to profile render pipeline's bandwidth of my game.Then i found depth buffer and stencil buffer use the same buffer whitch it's format is Depth32Float_Stencil8. But why in a single pass of pipeline, this buffer was loaded twice, and the Load Attachment Size of Encoder Statistics was double. Is there any bug with xcode gpu capture?Or the pass really loaded the buffer twice times?

Graphics & Games Metal

1

0

369

Feb ’25

Implementing Scalable Order-Independent Transparency (OIT) in Metal

Hi, Apple’s documentation on Order-Independent Transparency (OIT) describes an approach using image blocks, where an array of size 4 is allocated per fragment to store depth and color in a tile shading compute pass. However, when increasing the scene’s depth complexity by adding more overlapping quads, the OIT implementation fails due to the fixed array size. Is there a way to dynamically allocate storage for fragments based on actual depth complexity encountered during rasterization, rather than using a fixed-size array? Specifically, can an adaptive array of fragments be maintained and sorted by depth, where the size grows as needed instead of being limited to 4 entries? Any insights or alternative approaches would be greatly appreciated. Thank you!

Graphics & Games Metal Metal MetalKit Core Graphics

1

0

570

Feb ’25

Is Metal usable from Swift 6?

Hello ladies and gentlemen, I'm writing a simple renderer on the main actor using Metal and Swift 6. I am at the stage now where I want to create a render pipeline state using asynchronous API: @MainActor class Renderer { let opaqueMeshRPS: MTLRenderPipelineState init(/*...*/) async throws { let descriptor = MTLRenderPipelineDescriptor() // ... opaqueMeshRPS = try await device.makeRenderPipelineState(descriptor: descriptor) } } I get a compilation error if try to use the asynchronous version of the makeRenderPipelineState method: Non-sendable type 'any MTLRenderPipelineState' returned by implicitly asynchronous call to nonisolated function cannot cross actor boundary Which is understandable, since MTLRenderPipelineState is not Sendable. But it looks like no matter where or how I try to access this method, I just can't do it - you have this API, but you can't use it, you can only use the synchronous versions. Am I missing something or is Metal just not usable with Swift 6 right now?

Graphics & Games Metal Metal Swift Concurrency

1

0

636

Feb ’25

Metal: Non-uniform thread groups unsupported in Simulator? Is it?

My app is running Compute Shaders that use non-uniform thread groups. When I run the app in the debugger with a simulator target the app crashes on encoder.dispatchThreads and the error message is: Dispatch Threads with Non-Uniform Threadgroup Size is not supported on this device. Previously the log output states that: Metal Shader Validation is unsupported for Simulator. However: When I stop the debugger and just run the app in the simulator without the debugger attached, the app just runs fine and does not crash. The SwiftUI Preview that also triggers the Compute Shader when preparing data also just runs fine without a crash. I can run and debug on a real device no problem - I just don't have all sizes available. Is there anything I need to check in my lldb/simulator configuration? It obviously does work, just the debugger cannot really deal with it? Any input would be nice as this really slows my down as I have to be extremely careful when debugging on the simulator.

Graphics & Games Metal Metal Xcode Simulator LLDB

2

0

597

Feb ’25

some functions from NS::Bundle in metal-cppnot

*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSBundle allFrameworks]: unrecognized selector sent to instance NS::Bundle* Bundle = NS::Bundle::mainBundle(); Bundle->allFrameworks(); call to allFrameworks() and allBundles() will throw exception, but other functions work well.

Graphics & Games Metal Foundation Metal metal-cpp

3

0

656

Feb ’25

3D Skeletal animation in metal-cpp?

Hey all! I'm got my hands on a refurbished mac mini m1 and already diving into metal. At the moment, i'm currently studying graphics programming with opengl and got to a point where I can almost create a 3d cube. However, I noticed there aren't many tutorials for metal cpp but rather demos. One thing I love about graphic programming, is skinning/skeletal animation. At the moment, I can't find any sources or tutorials on how to load skeletal animations into metal-cpp. So, if I create my character in blender and had all types of animations all loaded into a .FBX or maybe .DAE and load this into metal api with metal-cpp, how can I go on about how this works?

Graphics & Games Metal MetalKit metal-cpp MetalFX

1

0

477

Feb ’25

Xcode Playground - The LLDB RPC server has crashed.

I am trying to learn Metal development on my MacBook Pro M1 Pro (Sequoia 15.3.1) on Xcode Playground, but when I write these two lines of code: import Metal let device = MTLCreateSystemDefaultDevice()! I get the error The LLDB RPC server has crashed. Any ideas as to what I can do to solve this? I have rebooted the machine and reinstalled Xcode...

Graphics & Games Metal Metal LLDB

3

0

524

Feb ’25

Post

Replies

Boosts

Views

Created