Skip to content

Conversation

@cameron314
Copy link
Contributor

Fixed processor count detection on Windows when multiple process groups (e.g. >64 processors) are present.

On our 80-core system, ninja was defaulting to a parallelism of 42 (only detecting 40 cores). It turns out that this is because the Win32 GetActiveProcessorCount function only returns the number of cores in the current core group. Since ninja dispatches jobs across the entire system, it makes sense to count the total number of cores across all core groups.

With this fix, ninja correctly detects all 80 cores on our system (even when built as a 32-bit binary) and has a default parallelism level of 82.

@jhasse
Copy link
Collaborator

jhasse commented Nov 12, 2019

I thought a3a5d60 fixed this already?

@cameron314
Copy link
Contributor Author

In a word, no; that fix only helps up to 64 cores. However, you're correct in that I was using an old version without that fix to test with. On our 80 core machine:

  • ninja with GetNativeSystemInfo: detects 40 cores
  • ninja with GetActiveProcessorCount(ALL_PROCESSOR_GROUPS): detects 64 cores
  • ninja with GetLogicalProcessorInformationEx (this patch): Detects 80 cores

@jhasse
Copy link
Collaborator

jhasse commented Nov 13, 2019

There's no need for the check as GetActiveProcessorCount(ALL_PROCESSOR_GROUPS) only works with Windows 7 and up IIRC.

@cameron314
Copy link
Contributor Author

Ping? :-)

@nico
Copy link
Collaborator

nico commented Dec 9, 2019

I tried to check this on a machine with 72 cores.

I put the 4 methods ninja has used over time to detect core count in a small stand-alone program at https://github.com/nico/hack/blob/master/nproc_win.cc

If I build this as a 64-bit binary and run it (actually, @atetubou ran it because I don't have a machine with that many cores), I get:

GetSystemInfo(): 36
GetNativeSystemInfo(): 36
GetActiveProcessorCount(): 72
GetLogicalProcessorInformationEx(): 72

If I build it as a 32-bit binary, I get:

GetSystemInfo(): 32
GetNativeSystemInfo(): 36
GetActiveProcessorCount(): 64
GetLogicalProcessorInformationEx(): 72

Are you perchance using a 32-bit ninja binary?

If not, could you paste the output of that program in 32-bit and 64-bit mode on your system?

@jhasse jhasse added the windows label Dec 9, 2019
@cameron314
Copy link
Contributor Author

I'm on vacation this week, but I'll report back the output when I get back.

I had checked before, and I believe it was a 64-bit binary, but I'll have to double-check.

@jessemckenna
Copy link
Contributor

When run on a 32-bit process, GetActiveProcessorCount(ALL_PROCESSOR_GROUPS) returns a maximum of 32 processors per group, i.e. the maximum number of processors that a 32-bit affinity mask can hold. If the 80-logical-processor machine in question has 2 groups of 40 processors each, that would be capped to 2 * 32 = 64 on a 32-bit process.

When I submitted a3a5d60, I thought that it was desirable not to return more processors than the process's affinity mask could hold, but I don't really know what the implications of doing that are. If there isn't any issue with that, it sounds like GetLogicalProcessorInformationEx() is an improvement over GetActiveProcessorCount(ALL_PROCESSOR_GROUPS).

@cameron314
Copy link
Contributor Author

I can't speak for everyone, but in our case we bought a dual-socket 80-virtual-core server specifically so that we could speed up our massively parallel C++ compilation times by building across all cores with ninja.

I don't know about the processor group implications, but I can attest that the build does go much faster with an explicit -j80. My guess is that since ninja is starting independent processes, it's normal to have them consume all cores across groups.

@cameron314
Copy link
Contributor Author

@nico, here are the results from running (as 64-bit) on our 80-core server:

> nproc_win.exe
GetSystemInfo(): 40
GetNativeSystemInfo(): 40
GetActiveProcessorCount(): 80
GetLogicalProcessorInformationEx(): 80

After double-checking, it seems that our ninja is 32-bit after all. With a 64-bit build, even the master version detects all cores. So, this pull request will only change the result for 32-bit builds.

I would suggest that it's useful to report the correct number of cores in 32-bit builds as well, since packagers like chocolatey only include the 32-bit build.

@scivision
Copy link
Contributor

scivision commented Apr 30, 2020

I didn't look into the code, but coincidentally the Jan 2020 release of GNU Make increased Windows maximum CPU count beyond 64: https://lists.gnu.org/archive/html/info-gnu/2020-01/msg00004.html just in case there was some correlation.

Copy link
Contributor

@ColinFinck ColinFinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see calls to the Win7 APIs GetActiveProcessorCount and GetLogicalProcessorInformationEx being guarded by GetProcAddress, and falling back to GetNativeSystemInfo if both of them don't exist.

This GetProcessorCount function is currently the only one preventing ninja 1.10.0 from being used on ReactOS/Windows XP/Server 2003. Ninja is an integral part of the ReactOS Build Environment, and for our latest release, we had to revert a3a5d60 to still be able to build our regression tests for Windows Server 2003 on that same platform. I also know of a few people still using XP/Server 2003 as a build platform for various reasons.

I agree that both Windows versions are out of support and the ReactOS argument only holds until these APIs are implemented there. But in this case, it would be so easy to restore support for them without any sacrifices for newer platforms.

I was about to open my own PR for that, but it would conflict with this one, which is why I'm asking here.

@jonesmz

This comment was marked as abuse.

@cameron314
Copy link
Contributor Author

Maybe we could merge this pull request first and then you could apply your patch on that? :-)

@jhasse
Copy link
Collaborator

jhasse commented Jun 8, 2020

When run on a 32-bit process, GetActiveProcessorCount(ALL_PROCESSOR_GROUPS) returns a maximum of 32 processors per group, i.e. the maximum number of processors that a 32-bit affinity mask can hold. If the 80-logical-processor machine in question has 2 groups of 40 processors each, that would be capped to 2 * 32 = 64 on a 32-bit process.

So the cap would be 2 * 64 = 128 with a 64-bit process?

This GetProcessorCount function is currently the only one preventing ninja 1.10.0 from being used on ReactOS/Windows XP/Server 2003.

I thought because Ninja is build using Visual Studio 2019 the binary doesn't run on Windows XP anyway?

@cameron314
Copy link
Contributor Author

So the cap would be 2 * 64 = 128 with a 64-bit process?

I'm not sure, this doesn't seem to be documented. I suspect that GetActiveProcessorCount will correctly sum the processor counts across all groups, but I don't know whether Windows permits an individual group to have more than 64 cores (from what I gather groups were introduced at the same time as the maximum core count was increased from 32 to 256).

I don't have a >128 core server to try it out :-)

@waruqi
Copy link

waruqi commented Jun 10, 2020

Does ninja not support xp? GetActiveProcessorCount and GetLogicalProcessorInformationEx do not seem to be provided on xp.

@jonesmz

This comment was marked as abuse.

…nly needed on 32-bit builds (following code review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants