Skip to content

Commit 4657974

Browse files
docs(readme): consolidate app features; clarify Cargo features/Build/Tests
Co-authored-by: openhands <openhands@all-hands.dev>
1 parent f0afa90 commit 4657974

1 file changed

Lines changed: 24 additions & 14 deletions

File tree

README.md

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Tesla M40–optimized Rust + CUDA LLM server/runtime. FP16 weights, FP32 compute
77
- FP16 storage / FP32 compute (cuBLAS/cuBLASLt as available)
88
- GGUF loader; C FFI symbols `m40llm_*` for embedding
99
- Small, explicit codebase focused on M40 performance
10+
- Optional HTTP server (enable with `--features server`)
1011

1112
## Who it’s for
1213
- M40 owners who want maximum throughput/low latency on this specific card
@@ -27,25 +28,34 @@ Tesla M40–optimized Rust + CUDA LLM server/runtime. FP16 weights, FP32 compute
2728
- Streams/Hyper‑Q: high‑priority decode stream, concurrent lower‑priority prefill
2829
- Read‑only (`__ldg`) and texture caches for non-GEMM ops (norms, embeddings)
2930

30-
## Application features
31-
- Single‑GPU inference server optimized for Tesla M40 (sm_52)
32-
- GGUF loader and stable C FFI (`m40llm_*`) for embedding
33-
- FP16 storage / FP32 compute path; initial focus on GEMM + KV cache correctness
34-
- Optional HTTP server (`--features server`)
35-
3631
## Build features (Cargo)
37-
- Feature flags: `cuda` (enable GPU path), `server` (HTTP API)
38-
- NVCC auto‑detect; stub build when missing (works with conda headers/libs)
39-
- cuBLAS header gating; tests and GEMM enabled only when available
32+
This project uses Cargo feature flags to switch between CPU‑only and GPU‑accelerated builds, and to include an optional HTTP server.
33+
34+
- `cuda`: Enables the CUDA backend. When set:
35+
- If `nvcc` is available, we compile CUDA kernels for sm_52 and link against CUDA runtime. If the cuBLAS header (`cublas_v2.h`) is found, we also link cuBLAS and enable GEMM paths and tests.
36+
- If `nvcc` is not available, we build and link a stub library that provides the required symbols so the crate still compiles. This lets you develop on machines with only CUDA headers/libs (e.g., conda) or no CUDA toolchain.
37+
- `server`: Includes the HTTP server binary routes so you can run `m40-llm run ...`.
38+
39+
Build script behavior:
40+
- Always exposes `cfg(have_cublas_header)` when the cuBLAS header is detected so tests can gate accordingly.
41+
- Always exposes `cfg(nvcc)` when `nvcc` is present so code/tests can detect a real CUDA toolchain.
4042

4143
## Build
42-
- Non-CUDA: `cargo build --no-default-features; cargo test --no-default-features`
43-
- CUDA, no NVCC (conda headers/libs): `cargo build --features cuda; cargo test --features cuda`
44-
- CUDA, with NVCC: `cargo build --features cuda; cargo test --features cuda`
44+
Build the project in one of these modes:
45+
46+
- CPU only (no CUDA):
47+
- Build: `cargo build --no-default-features`
48+
- Test: `cargo test --no-default-features`
49+
- CUDA enabled, without nvcc (e.g., conda headers/libs available):
50+
- Build: `cargo build --features cuda`
51+
- Test: `cargo test --features cuda`
52+
- CUDA enabled, with nvcc installed:
53+
- Build: `cargo build --features cuda`
54+
- Test: `cargo test --features cuda`
4555

4656
## Tests
47-
- Non-CUDA tests run by default
48-
- CUDA tests require `--features cuda` and will run when NVCC/headers are present
57+
- CPU‑only mode: `cargo test --no-default-features` runs all non‑CUDA tests.
58+
- CUDA mode (`--features cuda`): CUDA smoke and GEMM tests run when the environment has CUDA headers, and additional GEMM/cuBLAS tests run when the build detects `cublas_v2.h`. If `nvcc` is present, tests will also be able to detect `cfg(nvcc)` paths.
4959

5060
## Server (feature = server)
5161
```

0 commit comments

Comments
 (0)