Description
Inside a GitHub-hosted runner, calls to the macOS API MTLCreateSystemDefaultDevice returns nil. This prevents use of Metal, is not generally anticipated to happen on macOS, and can break arbitrary software, which is more likely to occur over time. This appears to be caused by GPU configuration in the guest environment.
Area for Triage:
Apple
Question, Bug, or Feature?:
?
Virtual environments affected
Expected behavior
MTLCreateSystemDefaultDevice() should return a non-nil value
Actual behavior
MTLCreateSystemDefaultDevice() returns nil
Repro steps
In the linked action run this API is called in both macOS and iOS Simulator environment
Metal device is nil
all devices []
What is this API?
This API is a chokepoint for use of Metal, the only non-deprecated graphics library on macOS. In addition, Metal is a general-purpose computing language that may be doing the heavy lifting when you call some other system API. It's increasingly likely over time that some software you use or test in a CI environment on Apple is trying to do this.
What is the significance of the current behavior?
Errors related to this appear in other reports, so I wonder if other macOS issues are related to this issue.
It is generally imagined that this API returning nil is not really possible on modern macOS. A brief survey of usage on GitHub supports this view, the predominant pattern being force-unwrapping the API (!) which crashes in a virtual environment. A minority of results generate a soft error, and I wasn't immediately able to turn up any examples that would function correctly in a GitHub runner.
Developers assume it works because a GPU supporting Metal has been a minimum system requirement for macOS since 10.14, and iOS for even longer. So this API working (e.g., slowly with integrated graphics) is imagined to be part of the macOS 10.14+ platform, rather than a question of availability on specific hardware. This is a very different expectation than Windows/Linux.
I asked someone with knowledge of the implementation for this API if there is any reason a developer today ought to handle a nil response, and they suggested nil probably indicates a serious OS fault, so not really.
Isn't there a software fallback for this?
Not for Metal itself. Codebases that predate widespread Metal availability may have kept around their old codepath which incidentally supported a fallback. These are increasingly not maintained or actively developed, and so if they exist they usually aren't the priority for running or testing/CI workflows.
Roblox recently wrote that
Today, for our audience, [opengl] is ~2% – which means our OpenGL backend barely matters anymore. We still maintain it but this will not continue for long.
Of course, new code written today is likely to skip this entirely and assume Metal is available.
What can be done about this?
The method I'm aware of is to passthrough the host GPU to the guest environment. I don't know if this can be done for multiple guests or would be sensible in GitHub's environment (I'm guessing not)
For virtualizing macOS 11, Apple is forcing a new set of low-level APIs. Some VMWare products have experimental support with these APIs to paravirtualize the host GPU into the guest environment which fixes this issue. So it seems like the situation for macOS 11 will be better, but might require additional or experimental config to make it work.
Description
Inside a GitHub-hosted runner, calls to the macOS API
MTLCreateSystemDefaultDevicereturnsnil. This prevents use of Metal, is not generally anticipated to happen on macOS, and can break arbitrary software, which is more likely to occur over time. This appears to be caused by GPU configuration in the guest environment.Area for Triage:
Apple
Question, Bug, or Feature?:
?
Virtual environments affected
Expected behavior
MTLCreateSystemDefaultDevice()should return a non-nil valueActual behavior
MTLCreateSystemDefaultDevice()returns nilRepro steps
In the linked action run this API is called in both macOS and iOS Simulator environment
What is this API?
This API is a chokepoint for use of Metal, the only non-deprecated graphics library on macOS. In addition, Metal is a general-purpose computing language that may be doing the heavy lifting when you call some other system API. It's increasingly likely over time that some software you use or test in a CI environment on Apple is trying to do this.
What is the significance of the current behavior?
Errors related to this appear in other reports, so I wonder if other macOS issues are related to this issue.
It is generally imagined that this API returning
nilis not really possible on modern macOS. A brief survey of usage on GitHub supports this view, the predominant pattern being force-unwrapping the API (!) which crashes in a virtual environment. A minority of results generate a soft error, and I wasn't immediately able to turn up any examples that would function correctly in a GitHub runner.Developers assume it works because a GPU supporting Metal has been a minimum system requirement for macOS since 10.14, and iOS for even longer. So this API working (e.g., slowly with integrated graphics) is imagined to be part of the macOS 10.14+ platform, rather than a question of availability on specific hardware. This is a very different expectation than Windows/Linux.
I asked someone with knowledge of the implementation for this API if there is any reason a developer today ought to handle a
nilresponse, and they suggested nil probably indicates a serious OS fault, so not really.Isn't there a software fallback for this?
Not for Metal itself. Codebases that predate widespread Metal availability may have kept around their old codepath which incidentally supported a fallback. These are increasingly not maintained or actively developed, and so if they exist they usually aren't the priority for running or testing/CI workflows.
Roblox recently wrote that
Of course, new code written today is likely to skip this entirely and assume Metal is available.
What can be done about this?
The method I'm aware of is to passthrough the host GPU to the guest environment. I don't know if this can be done for multiple guests or would be sensible in GitHub's environment (I'm guessing not)
For virtualizing macOS 11, Apple is forcing a new set of low-level APIs. Some VMWare products have experimental support with these APIs to paravirtualize the host GPU into the guest environment which fixes this issue. So it seems like the situation for macOS 11 will be better, but might require additional or experimental config to make it work.