-
-
Notifications
You must be signed in to change notification settings - Fork 90
Open
Description
Summary
INT8 quantization speeds ingest but doesn’t shrink footprint. For MSMARCO-1M:
- Ingest time: ~1h07 (INT8) vs. ~1h51–2h00 (FP32).
- Recall: ~0.9072 vs. ~0.8994 (comparable).
- Footprint: db size ~10.6 GB (INT8) vs. ~9.6 GB (FP32); RSS still ~9.5 GB.
It appears quantized vectors are not reducing storage/memory—likely due to duplicated storage (graph + bucket) or storing float copies alongside int8.
Repro
- Build vector index with
quantization=INT8,storeVectorsInGraph=false. - Ingest MSMARCO-1M, run one search to trigger graph build.
- Measure db size and RSS during/after build.
Expected
INT8 should materially reduce disk and memory vs. FP32.
Actual
Disk increases and RSS remains high.
Possible causes
- Vectors still serialized as float somewhere (graph or doc fetch).
- Duplication persists despite quantization (graph + bucket both storing vectors).
Help
I can investigate storage paths and submit a fix once root cause is confirmed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels