Disentangle kernel-manager include tree#3544
Disentangle kernel-manager include tree#3544JanVogelsang wants to merge 146 commits intonest:mainfrom
Conversation
…cular include chains
|
Important to note: I kept the internal API identical. All I've done essentially was to use references to the managers instead which allowed to forward-declare the managers in the kernel_manager.h file. The reason there are that many changes is that including the kernel_manager.h no longer means that one also includes ALL other managers. So each file using a specific manager has to include that manager explicitly. This is much cleaner and will also speed up compilation a bit. |
|
I have no idea what's supposed to be wrong with the 2 copyright headers. @terhorstd Could you take a look please? |
|
@JanVogelsang This looks very interesting! Just to make sure I understand your solution correctly: Instead of having the individual managers as actual data members of the This is certainly a great code cleanup. I am just a tiny bit concerned about performance, since we now have essentially pointers to the individual managers. Have you run any benchmarks yet? |
Very strange. Have you tried to run the copyright check locally on your machine? |
|
@heplesser Yes, this is correct! I benchmarked the performance and the PR's solution is indeed around 5% slower (locally and on Juwels). I have to admit that this surprises me a lot, as while this introduces another pointer, I would have assumed that all these pointers (or mostly just the simulation_manager and event_delivery_manager pointers during simulation) would be cached anyway. This seems to not be possible and I am not sure why. Then again, the cause of the performance degradation might also be something entirely different, for example growing code size which harms instruction cache efficiency due to more function inlining. I implemented two other solutions now, which both require the internal API to change, though:
namespace kernel {
template<typename ManagerT, std::enable_if_t<std::is_base_of_v<ManagerInterface, ManagerT>>* = nullptr>
ManagerT& manager()
{
static ManagerT static_manager;
return static_manager;
}
}
namespace kernel {
template <class T>
inline T g_manager_instance;
template <class T>
T& manager() noexcept {
return g_manager_instance<T>;
}
}I'm currently benchmarking both solutions, but local benchmarks suggest that the second approach performs best. |
I managed to solve this, it was just two empty files which I deleted but somehow they were added back in again, just empty. |
|
@JanVogelsang Thanks for checking and the new solutions! I slightly prefer the second option, but that is mainly based on the fact that I still haven't entirely wrapped my mind around the Is it necessary to place this into a separate namespace |
| */ | ||
| #include "grid_layer.h" | ||
| #include "layer.h" | ||
|
|
There was a problem hiding this comment.
Did these lines move the wrong way around? layer.h was deleted, but the _impl file now has all code?
What is the reason for dropping the include-guards?
There was a problem hiding this comment.
Thanks for catching this, it definitely wasn't intentional. I think I first removed the layer_impl.h, but then added it back again as there was a circular dependency with the grid_layer.h. So I must have accidentally copied too much from the layer.h over to the layer_impl.h
There was a problem hiding this comment.
@terhorstd Then again, it might actually make more sense to move all template function definitions to the impl file (in general, not only here), as the header files should ideally contain declarations only. Finn is currently doing that for all other files as well (at least header-to-cpp, not header-to-impl yet).
There was a problem hiding this comment.
Just to be clear: This is fixed now, the .cpp file contains all non-template function definitions and the .impl file contains the template function definitions.
| #include <algorithm> | ||
| #include <iostream> | ||
| #include <vector> |
There was a problem hiding this comment.
@jessica-mitchell, this would be a perfect place to link into the Doxygen documentation instead of copy-pasting some random code file into the RST.
For now however some reference syntax could shorten this somewhat unrelated change in this PR. Isn't there a possibility to link to a code file to be included? Instead of this PR updating the docs with the new file, it would be preferable to have something like:
.. code: cpp
file: ../nestkernel/stopwatch.h
Would avoid this excessive change in an rst file.
There was a problem hiding this comment.
Talked with @terhorstd, so not to prevent this PR from moving forward, we will discuss this documentation issue in issue #3607 and make a decision in a separate PR.
There was a problem hiding this comment.
Perfect, thanks!
|
@JanVogelsang, this is big work! Thanks a lot for this big cleaning effort! Could you also check the relation to #3465 ? |
That's totally fine as well. The additional/nested namespace ( |
Done in JanVogelsang#14. Note that after merging these changes there is a todo left for @JanVogelsang to
|
|
@JanVogelsang I now tried to build with LTO on macOS using |
Merge flush events into PR 3544
|
It turned out that I used CMake's |
…rnel-manager-refactoring
So do you use llvm-ar/ranlib to link under Ubuntu even when compiling with GCC? |
|
No, for GCC the default ar and ranlib are used ( |
Fix warnings about undefined template member when building with Clang
|
@JanVogelsang I sent you a new PR with much simplified ProcessOptions code which now works (with limits) on macOS (with Clang) and under Linux (with GCC; Clang not tested). Under macOS, gcc-15 from Homebrew does not work on my system (macOS 26.4.1, Apple Silicon). This seems to be due to a bug in the newest Apple SDK (https://github.com/orgs/Homebrew/discussions/6778#discussioncomment-16557516). Under Linux with gcc-13, I get a curious error from I find this curious because |
|
@heplesser The PR unfortunately doesn't work for Clang 18 on Ubuntu. But strangely, I also don't get the spin_detector error with GCC 13. |
|
Interestingly, the spin_detector issue also occurs when building with CI-beNNch. But also not here in our CI testsuite. |
|
@JanVogelsang Thanks for the update. Here a few more observations (and a question) from my side:
with my 3544-branch (ie minimal settings for LTO) and everything built nicely out of the box. The duration of the final linking stage and the size of the
Any idea what might be going on here? |
|
@heplesser But in the CMake call you shared you specified "-Dwith-lto=OFF", so are you sure that LTO ran? For the nest::sort issue, I have no idea, but an LLM suggested replacing the line by |
I am quite sure, I think I just copy-pasted the wrong CMake call, but I will double-check later. For Clang, I turned off MPI, OpenMP and HDF5 since that cause some library trouble.
The LLM took a rather shallow look, I fear. |
|
Strange, I wonder how LTO works for you on Ubuntu with clang but not for me. As I also have Ubuntu 24.04 and Clang 18. And yes you are totally right, the LLM's suggestion didn't make any sense, I should have seen that immediately. I also had some issues with the iterator_pair logic in the past, but can't remember the details. |
|
I just managed to reproduced the spin_detector problem on my mac with gcc13, investigating. |
|
A slight step forward with the spin_detector problem: I can reproduce it with gcc15 under macOS 26.4 now and it is rather curious: It occurs only when building NEST with Using the
With HDF5 on, on the other hand
So the question now is why the order gets messed up when HDF5 is off. |
|
I think I am a bit further. Looking at the dependencies file while without HDF5 it begins with Note the last line: here we include And indeed, looking at the Now why does HDF5 make a difference? Looking at the actual compile commands executed by make, we have without HDF5 and with HDF5 Note the |
|
The actual include flags used by the compiler are given by the |
|
Oh wow, that's some incredible detective work. Should we then delay this PR to the next hackathon and put "CMake modernizations" high up on the ToDo-List for the hackathon? Or how would you say we should continue? |
|
We also have the linker problem under Ubuntu 24 when building with, LTO, Clang 18 and OpenMP. The Clang-22 problem I mentioned earlier also exists in the main branch, so at least that is not related to this PR. So postponing to the hackathon might be an idea. At least, I see no way to resolve this within a few days. |
Summary of changes
This PR fixes the horrible kernel-manager singleton design and cleans up all circular include chains which were just hidden behind the kernel-manager and plenty "*_impl.h" files.
Even though this PR touches 250+ files, the only functional changes were done in the kernel_manager.h/.cpp and the other changes are just follow-up refactorings enabled by the changes. I also got rid of most of the impl-files, which are not required anymore and just complicated things.
Why this was necessary
Anyone who ever worked on the kernel will immediately know about the pain the kernel-manager caused. As the kernel-manager included ALL other manager in the kernel_manager.h file, any other class using the kernel-manager therefore also included ALL other managers. This resulted in either of two things:
2.1. The kernel-manager could only be included in the .cpp file, which meant that the function using the kernel-manager (and thus any of the other managers) could not be inlined, which in some cases is detrimental for performance.
2.2. We had to create those impl.h files and use the kernel-manager in there. However, impl.h files and correctly including them is a nightmare and leads to various linker issues which are extremely difficult to debug.
The only time when one should use impl.h files is for some template specializations.