Skip to content

Commit 8a9d3b4

Browse files
cleanup
1 parent b4440d6 commit 8a9d3b4

File tree

3 files changed

+28
-66
lines changed

3 files changed

+28
-66
lines changed

cmake/utils.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@ endmacro()
248248
# For the given `SRC_CUDA_ARCHS` list of gencode versions in the form
249249
# `<major>.<minor>[letter]` compute the "loose intersection" with the
250250
# `TGT_CUDA_ARCHS` list of gencodes.
251-
# The loose interection is defined as:
251+
# The loose intersection is defined as:
252252
# { max{ x \in tgt | x <= y } | y \in src, { x \in tgt | x <= y } != {} }
253253
# where `<=` is the version comparison operator.
254254
# In other words, for each version in `TGT_CUDA_ARCHS` find the highest version

tests/kernels/test_marlin_kernel.py

Lines changed: 0 additions & 22 deletions
This file was deleted.

tools/report_build_time_ninja.py

Lines changed: 27 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -6,51 +6,35 @@
66
# Modified version of: https://chromium.googlesource.com/chromium/tools/depot_tools.git/+/refs/heads/main/post_build_ninja_summary.py
77
"""Summarize the last ninja build, invoked with ninja's -C syntax.
88
9-
This script is designed to be automatically run after each ninja build in
10-
order to summarize the build's performance. Making build performance information
11-
more visible should make it easier to notice anomalies and opportunities. To use
12-
this script on Windows just set NINJA_SUMMARIZE_BUILD=1 and run autoninja.bat.
13-
14-
On Linux you can get autoninja to invoke this script using this syntax:
15-
16-
$ NINJA_SUMMARIZE_BUILD=1 autoninja -C out/Default/ chrome
17-
18-
You can also call this script directly using ninja's syntax to specify the
19-
output directory of interest:
20-
21-
> python3 post_build_ninja_summary.py -C out/Default
9+
> python3 tools/report_build_time_ninja.py -C build/..
2210
2311
Typical output looks like this:
24-
25-
>ninja -C out\debug_component base
26-
ninja.exe -C out\debug_component base -j 960 -l 48 -d keeprsp
27-
ninja: Entering directory `out\debug_component'
28-
[1 processes, 1/1 @ 0.3/s : 3.092s ] Regenerating ninja files
29-
Longest build steps:
30-
0.1 weighted s to build obj/base/base/trace_log.obj (6.7 s elapsed time)
31-
0.2 weighted s to build nasm.exe, nasm.exe.pdb (0.2 s elapsed time)
32-
0.3 weighted s to build obj/base/base/win_util.obj (12.4 s elapsed time)
33-
1.2 weighted s to build base.dll, base.dll.lib (1.2 s elapsed time)
34-
Time by build-step type:
35-
0.0 s weighted time to generate 6 .lib files (0.3 s elapsed time sum)
36-
0.1 s weighted time to generate 25 .stamp files (1.2 s elapsed time sum)
37-
0.2 s weighted time to generate 20 .o files (2.8 s elapsed time sum)
38-
1.7 s weighted time to generate 4 PEFile (linking) files (2.0 s elapsed
39-
time sum)
40-
23.9 s weighted time to generate 770 .obj files (974.8 s elapsed time sum)
41-
26.1 s weighted time (982.9 s elapsed time sum, 37.7x parallelism)
42-
839 build steps completed, average of 32.17/s
43-
44-
If no gn clean has been done then results will be for the last non-NULL
45-
invocation of ninja. Ideas for future statistics, and implementations are
46-
appreciated.
47-
48-
The "weighted" time is the elapsed time of each build step divided by the number
49-
of tasks that were running in parallel. This makes it an excellent approximation
50-
of how "important" a slow step was. A link that is entirely or mostly serialized
51-
will have a weighted time that is the same or similar to its elapsed time. A
52-
compile that runs in parallel with 999 other compiles will have a weighted time
53-
that is tiny."""
12+
```
13+
Longest build steps for .cpp.o:
14+
1.0 weighted s to build ...torch_bindings.cpp.o (12.4 s elapsed time)
15+
2.0 weighted s to build ..._attn_c.dir/csrc... (23.5 s elapsed time)
16+
2.6 weighted s to build ...torch_bindings.cpp.o (31.5 s elapsed time)
17+
3.2 weighted s to build ...torch_bindings.cpp.o (38.5 s elapsed time)
18+
Longest build steps for .so (linking):
19+
0.1 weighted s to build _core_C.abi3.so (0.7 s elapsed time)
20+
0.1 weighted s to build _moe_C.abi3.so (1.0 s elapsed time)
21+
0.5 weighted s to build ...flash_attn_c.abi3.so (1.1 s elapsed time)
22+
6.2 weighted s to build _C.abi3.so (6.2 s elapsed time)
23+
Longest build steps for .cu.o:
24+
15.3 weighted s to build ...machete_mm_... (183.5 s elapsed time)
25+
15.3 weighted s to build ...machete_mm_... (183.5 s elapsed time)
26+
15.3 weighted s to build ...machete_mm_... (183.6 s elapsed time)
27+
15.3 weighted s to build ...machete_mm_... (183.7 s elapsed time)
28+
15.5 weighted s to build ...machete_mm_... (185.6 s elapsed time)
29+
15.5 weighted s to build ...machete_mm_... (185.9 s elapsed time)
30+
15.5 weighted s to build ...machete_mm_... (186.2 s elapsed time)
31+
37.4 weighted s to build ...scaled_mm_c3x.cu... (449.0 s elapsed time)
32+
43.9 weighted s to build ...scaled_mm_c2x.cu... (527.4 s elapsed time)
33+
344.8 weighted s to build ...attention_...cu.o (1087.2 s elapsed time)
34+
1110.0 s weighted time (10120.4 s elapsed time sum, 9.1x parallelism)
35+
134 build steps completed, average of 0.12/s
36+
```
37+
"""
5438

5539
import argparse
5640
import errno

0 commit comments

Comments
 (0)