INT8 quantization doesn’t reduce disk/RSS; db size grows vs. FP32

**Summary**
INT8 quantization speeds ingest but doesn’t shrink footprint. For MSMARCO-1M:
- Ingest time: ~1h07 (INT8) vs. ~1h51–2h00 (FP32).
- Recall: ~0.9072 vs. ~0.8994 (comparable).
- Footprint: db size ~10.6 GB (INT8) vs. ~9.6 GB (FP32); RSS still ~9.5 GB.

It appears quantized vectors are not reducing storage/memory—likely due to duplicated storage (graph + bucket) or storing float copies alongside int8.

**Repro**
1) Build vector index with `quantization=INT8`, `storeVectorsInGraph=false`.
2) Ingest MSMARCO-1M, run one search to trigger graph build.
3) Measure db size and RSS during/after build.

**Expected**
INT8 should materially reduce disk and memory vs. FP32.

**Actual**
Disk increases and RSS remains high.

**Possible causes**
- Vectors still serialized as float somewhere (graph or doc fetch).
- Duplication persists despite quantization (graph + bucket both storing vectors).

**Help**
I can investigate storage paths and submit a fix once root cause is confirmed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

INT8 quantization doesn’t reduce disk/RSS; db size grows vs. FP32 #3143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

INT8 quantization doesn’t reduce disk/RSS; db size grows vs. FP32 #3143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions