Commit 32e9ea6
ANN_BENCH: integrate NVTX statistics (rapidsai#1529)
Add the aggregate reporting of NVTX ranges in the output of benchmark executable.
### Usage
```bash
# Measure the CPU and GPU runtime of all NVTX ranges
nsys launch --trace=cuda,nvtx <ANN_BENCH with arguments>
# Measure only the CPU runtime of all NVTX ranges
nsys launch --trace=nvtx <ANN_BENCH with arguments>
# Do not measure/report any NVTX ranges
<ANN_BENCH with arguments>
# Do not measure/report any NVTX ranges within benchmark, but use nsys profiling as usual
nsys profile ... <ANN_BENCH with arguments>
```
### Implementation
The PR adds a single module `nvtx_stats.hpp` to the benchmark executable; there are no changes to the library at all.
The program leverages NVIDIA Nsight Systems CLI to collect and export NVTX statistics and then SQLite API to aggregate it into the benchmark state:
1. Detect if run via `nsys launch`; if so, call `nsys start` / `nsys stop` around benchmark loop; otherwise do nothing.
2. If the report is generated, read it and query all NVTX events and the GPU correlation data using SQLite
3. Aggregate the NVTX events by their short names (without arguments to reduce the number of columns)
4. Add them to the benchmark performance counters with the same averaging strategy as the global CPU/GPU runtime.
### Performance cost
If the benchmark is **not** run using `nsys launch`, there's virtually zero overhead in the new functionality.
Otherwise, there are overheads:
1. Usual nsys profiling overheads (minimized by disabling unused information via `nsys start` CLI internally). This affects the reported performance the same way as normal nsys profiling does (especially if cuda tracing is enabled).
2. One or more data collection/exporting events per benchmark case. These add some extra time to the benchmark time, but do not affect the counters (they are not the part of the benchmark loop)
Closes rapidsai#1367
Authors:
- Artem M. Chirkin (https://github.com/achirkin)
Approvers:
- Tamas Bela Feher (https://github.com/tfeher)
URL: rapidsai#15291 parent 7c8645d commit 32e9ea6
4 files changed
Lines changed: 590 additions & 2 deletions
File tree
- cpp
- bench/ann
- src/common
- cmake/thirdparty
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
96 | 97 | | |
97 | 98 | | |
98 | 99 | | |
| |||
144 | 145 | | |
145 | 146 | | |
146 | 147 | | |
| 148 | + | |
147 | 149 | | |
148 | 150 | | |
149 | 151 | | |
| |||
358 | 360 | | |
359 | 361 | | |
360 | 362 | | |
361 | | - | |
| 363 | + | |
362 | 364 | | |
363 | 365 | | |
364 | 366 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
138 | 139 | | |
139 | 140 | | |
140 | 141 | | |
| 142 | + | |
141 | 143 | | |
142 | 144 | | |
143 | 145 | | |
| |||
293 | 295 | | |
294 | 296 | | |
295 | 297 | | |
| 298 | + | |
296 | 299 | | |
297 | 300 | | |
298 | 301 | | |
| |||
0 commit comments