Skip to content

Commit 3f77e46

Browse files
authored
Merge pull request #1934 from rapidsai/release/26.04
Forward-merge release/26.04 into main
2 parents 48c76c6 + 70dc032 commit 3f77e46

4 files changed

Lines changed: 851 additions & 0 deletions

File tree

docs/source/advanced_topics.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Advanced Topics
2+
===============
3+
4+
- `Just-in-Time Compilation`_
5+
6+
Just-in-Time Compilation
7+
------------------------
8+
cuVS uses the Just-in-Time (JIT) `Link-Time Optimization (LTO) <https://developer.nvidia.com/blog/cuda-12-0-compiler-support-for-runtime-lto-using-nvjitlink-library/>`_ compilation technology to compile certain kernels. When a JIT compilation is triggered, cuVS will compile the kernel for your architecture and automatically cache it in-memory and on-disk. The validity of the cache is as follows:
9+
10+
1. In-memory cache is valid for the lifetime of the process.
11+
2. On-disk cache is valid until a CUDA driver upgrade is performed. The cache can be portably shared between machines in network or cloud storage and we strongly recommend that you store the cache in a persistent location. For more details on how to configure the on-disk cache, look at CUDA documentation on `JIT Compilation <https://docs.nvidia.com/cuda/cuda-programming-guide/05-appendices/environment-variables.html#jit-compilation>`_. Specifically, the environment variables of interest are: `CUDA_CACHE_PATH` and `CUDA_CACHE_MAX_SIZE`.
12+
13+
14+
Thus, the JIT compilation is a one-time cost and you can expect no loss in real performance after the first compilation. We recommend that you run a "warmup" to trigger the JIT compilation before the actual usage.
15+
16+
Currently, the following capabilities will trigger a JIT compilation:
17+
- IVF Flat search APIs: :doc:`cuvs::neighbors::ivf_flat::search() <cpp_api/neighbors_ivf_flat>`
18+
19+
.. toctree::
20+
:maxdepth: 2
21+
22+
jit_lto_guide

docs/source/developer_guide.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -406,3 +406,13 @@ void foo(const raft::resources& res, ...)
406406
...
407407
}
408408
```
409+
410+
## Using Just-in-Time Link-Time Optimization
411+
412+
cuVS is moving to using link-time optimization for new kernels, and this requires some changes to the way kernels are written. Instead of compiling all kernel variants at build time (which leads to binary size explosion), JIT LTO compiles kernel fragments separately and links them together at runtime based on the specific configuration needed.
413+
414+
This approach ultimately enables:
415+
- **Reduced binary size**: Compile fragments once, combine many ways
416+
- **User Defined Functions**: Link UDFs in cuVS CUDA kernels
417+
418+
For more information on JIT LTO, see [Advanced Topics](advanced_topics). For a complete guide on implementing JIT LTO kernels, including step-by-step examples, see the [JIT LTO Guide](jit_lto_guide.md).

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,5 +87,6 @@ Contents
8787
integrations.rst
8888
cuvs_bench/index.rst
8989
api_docs.rst
90+
advanced_topics.rst
9091
contributing.md
9192
developer_guide.md

0 commit comments

Comments
 (0)