Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,13 @@ here. Generic improvements to Clang as a whole or to its underlying
infrastructure are described first, followed by language-specific
sections with improvements to Clang's support for those languages.

- Clang implemented improvements to BMI of C++20 Modules that can reduce
the number of rebuilds during incremental recompilation. We are seeking
feedback from Build System authors and other interested users, especially
when you feel Clang changes the BMI and missses an opportunity to avoid
recompilations or causes correctness issues. See StandardCPlusPlusModules
`StandardCPlusPlusModules <StandardCPlusPlusModules.html>`_ for more details.

- The ``\par`` documentation comment command now supports an optional
argument, which denotes the header of the paragraph started by
an instance of the ``\par`` command comment. The implementation
Expand Down
135 changes: 135 additions & 0 deletions clang/docs/StandardCPlusPlusModules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,141 @@ in the future. The expected roadmap for Reduced BMIs as of Clang 19.x is:
comes, the term BMI will refer to the Reduced BMI and the Full BMI will only
be meaningful to build systems which elect to support two-phase compilation.

Experimental Non Cascade Change
-------------------------------

This section is primarily for build system vendors. For end compiler users,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been discussed somewhere that I missed, please point me to that if that's the case.

I was putting a little more thought into it and am wondering what would be a way for the build systems to take advantage of that?

When compiling something that dependends on modules, the build system will need to check if any of the (transitive) dependencies of the modules changes, right? Because any BMI that was changed in the transitive set can potentially affect the result of the current compilations.

So even if direct dependencies did not change (because of this optimization), despite a change in the deeper transitive dependency, the build system still can't reuse the result and has to rerun the compile.

We only know that the output BMI wasn't affected by any of the transitive changes after we finish compiling it and can compare the outputs, right?
But the build system needs to know about it before the compilation happens to avoid recompilations.

I suspect I'm missing something, but don't know what...

Copy link
Member Author

@ChuanqiXu9 ChuanqiXu9 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been discussed somewhere that I missed, please point me to that if that's the case.

I was putting a little more thought into it and am wondering what would be a way for the build systems to take advantage of that?

When compiling something that dependends on modules, the build system will need to check if any of the (transitive) dependencies of the modules changes, right? Because any BMI that was changed in the transitive set can potentially affect the result of the current compilations.

So even if direct dependencies did not change (because of this optimization), despite a change in the deeper transitive dependency, the build system still can't reuse the result and has to rerun the compile.

Yes, and this is the reason why it is experimental.

We only know that the output BMI wasn't affected by any of the transitive changes after we finish compiling it and can compare the outputs, right? But the build system needs to know about it before the compilation happens to avoid recompilations.

I suspect I'm missing something, but don't know what...

It is not about to avoid the re-compilation for that unchanged BMI, but for other TUs that only dependent on the unchanged BMI. For example,

// a.cppm
export module a;
// intentional empty
// b.cppm
export module b;
import a;
// intentional empty
// c.cpp
import b;
...

For this example, every time a.cppm changes, we need to recompile b.cppm. But if the BMI of module B doesn't change at all, it should be good to not recompile c.cpp.

So this is the ability provided by the compiler that all the needed changes are propagated to the BMI:

for (Module *M : TouchedTopLevelModules)
Hasher.update(M->Signature);
then the build system can try to ignore the changes from transitively imported things.

The theory of this is that, the users of a build system, can only access the entities in the indirectly imported modules via the directly imported modules. So that the directly imported modules have a full control of accessiable entities in the indirectly imported modules for their (the directly imported modules) users.


Maybe it is a good idea for the build system to provide a verify option that the skippable compilations can have the same result if they are not skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought is happening, thanks for confirming my intuition.
I think it'd be great to spell this out in the documentation, because I believe this mode of operation is something that the build system vendors might need to infer from the whole text right now.

Adding something like the following should do the trick:

We encourage build systems to add an experimental mode that
reuses the cached BMI when **direct** dependencies did not change,
even if **transitive** dependencies did change.

PS My 2 cents on Bazel: I suspect this actually goes against its design. Bazel has very strong capabilities to avoid recompilations when inputs don't change, but it relies on stable hashes for all inputs (including transitive dependencies).
At the same time I suspect that other build systems (CMake + ccache / sccache) are more flexible in that regard.

PPS I am not a build system expert, so take that with a grain of salt. I'm also happy to loop in Bazel folks I work with to confirm or rebut my claims about Bazel's design, if that's useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, added suggested text.

For bazel, I think it might be fine since it should be configurable that whether not to recompile a target as far as I know.

if you don't want to read it all, this is helpful to reduce recompilations
We encourage build system vendors and end users try this out and bring feedbacks

Before Clang 19, a change in BMI of any (transitive) dependency would cause the
outputs of the BMI to change. Starting with Clang 19, changes to non-direct
dependencies should not directly affect the output BMI, unless they affect the
results of the compilations. We expect that there are many more opportunities
for this optimization than we currently have realized and would appreaciate
feedback about missed optimization opportunities. For example,

.. code-block:: c++

// m-partA.cppm
export module m:partA;

// m-partB.cppm
export module m:partB;
export int getB() { return 44; }

// m.cppm
export module m;
export import :partA;
export import :partB;

// useBOnly.cppm
export module useBOnly;
import m;
export int B() {
return getB();
}

// Use.cc
import useBOnly;
int get() {
return B();
}

To compile the project (for brevity, some commands are omitted.):

.. code-block:: console

$ clang++ -std=c++20 m-partA.cppm --precompile -o m-partA.pcm
$ clang++ -std=c++20 m-partB.cppm --precompile -o m-partB.pcm
$ clang++ -std=c++20 m.cppm --precompile -o m.pcm -fprebuilt-module-path=.
$ clang++ -std=c++20 useBOnly.cppm --precompile -o useBOnly.pcm -fprebuilt-module-path=.
$ md5sum useBOnly.pcm
07656bf4a6908626795729295f9608da useBOnly.pcm

If the interface of ``m-partA.cppm`` is changed to:

.. code-block:: c++

// m-partA.v1.cppm
export module m:partA;
export int getA() { return 43; }

and the BMI for ``useBOnly`` is recompiled as in:

.. code-block:: console

$ clang++ -std=c++20 m-partA.cppm --precompile -o m-partA.pcm
$ clang++ -std=c++20 m-partB.cppm --precompile -o m-partB.pcm
$ clang++ -std=c++20 m.cppm --precompile -o m.pcm -fprebuilt-module-path=.
$ clang++ -std=c++20 useBOnly.cppm --precompile -o useBOnly.pcm -fprebuilt-module-path=.
$ md5sum useBOnly.pcm
07656bf4a6908626795729295f9608da useBOnly.pcm

then the contents of ``useBOnly.pcm`` remain unchanged.
Consequently, if the build system only bases recompilation decisions on directly imported modules,
it becomes possible to skip the recompilation of ``Use.cc``.
It should be fine because the altered interfaces do not affect ``Use.cc`` in any way;
there are non cascade changes.

When ``Clang`` generates a BMI, it records the hash values of all potentially contributory BMIs
for the BMI being produced. This ensures that build systems are not required to consider
transitively imported modules when deciding whether to recompile.

What is considered to be a potential contributory BMIs is currently unspecified.
However, it is a severe bug for a BMI to remain unchanged following an observable change
that affects its consumers.

We recommend that build systems support this feature as a configurable option so that users
can go back to the transitive change mode safely at any time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this happen? Is something missing in the example commands? They look "normal" to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean in what cases the users need to go back to normal mode? The answer may be when they meet compiler bugs.

Or if you're asking about the command line options, there is no such option in the compiler side. We refactored the format in the BMI so it is in some level always "enabled". But the users have to wait the build system's support to feel so called "no-transitive-change". So the "option" here means a build system option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so what do I do differently? AFAIK, CMake is relying on the compiler's -MF reporting mechanisms here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the higher level, after offering an experimental option to enable the mode, I think what build systems to do is, when considering the recompilations, the build system don't need to consider the indirectly imported modules. And for -MF, as far as I know, it will collect the included headers and headers shouldn't take part in here.

For example,

// a.cpp
#include "a.h"
import a;
...

If a.h changes, I think we should recompile a.cpp.

For implementations, I don't know the implementations of cmake. But I remember you said somewhere that it might be possible to make it by a compile-and-swap action.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok.

FWIW, I was thinking that even a.pcm would get reported in -MF for your example and elided if it didn't contribute. As for the restat = 1 with a swap, yes, that is build-system side. I think some docs saying something like:

Build systems may utilize this optimization by doing an update-if-changed operation to the BMI
that is consumed from the BMI that is output by the compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion Added


Interactions with Reduced BMI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With reduced BMI, the non cascade change feature can be more powerful. For example,

.. code-block:: c++

// A.cppm
export module A;
export int a() { return 44; }

// B.cppm
export module B;
import A;
export int b() { return a(); }

.. code-block:: console

$ clang++ -std=c++20 A.cppm -c -fmodule-output=A.pcm -fexperimental-modules-reduced-bmi -o A.o
$ clang++ -std=c++20 B.cppm -c -fmodule-output=B.pcm -fexperimental-modules-reduced-bmi -o B.o -fmodule-file=A=A.pcm
$ md5sum B.pcm
6c2bd452ca32ab418bf35cd141b060b9 B.pcm

And let's change the implementation for ``A.cppm`` into:

.. code-block:: c++

export module A;
int a_impl() { return 99; }
export int a() { return a_impl(); }

and recompile the example:

.. code-block:: console

$ clang++ -std=c++20 A.cppm -c -fmodule-output=A.pcm -fexperimental-modules-reduced-bmi -o A.o
$ clang++ -std=c++20 B.cppm -c -fmodule-output=B.pcm -fexperimental-modules-reduced-bmi -o B.o -fmodule-file=A=A.pcm
$ md5sum B.pcm
6c2bd452ca32ab418bf35cd141b060b9 B.pcm

We should find the contents of ``B.pcm`` keeps the same. In such case, the build system is
allowed to skip recompilations of TUs which solely and directly dependent on module B.

This only happens with reduced BMI. Since with reduced BMI, we won't record the function body
of ``int b()`` in the BMI for ``B`` so that the module A doesn't contribute to the BMI of ``B``
and we have less dependencies.

Performance Tips
----------------

Expand Down