Skip to content

INT8 quantization doesn’t reduce disk/RSS; db size grows vs. FP32 #3143

@tae898

Description

@tae898

Summary
INT8 quantization speeds ingest but doesn’t shrink footprint. For MSMARCO-1M:

  • Ingest time: ~1h07 (INT8) vs. ~1h51–2h00 (FP32).
  • Recall: ~0.9072 vs. ~0.8994 (comparable).
  • Footprint: db size ~10.6 GB (INT8) vs. ~9.6 GB (FP32); RSS still ~9.5 GB.

It appears quantized vectors are not reducing storage/memory—likely due to duplicated storage (graph + bucket) or storing float copies alongside int8.

Repro

  1. Build vector index with quantization=INT8, storeVectorsInGraph=false.
  2. Ingest MSMARCO-1M, run one search to trigger graph build.
  3. Measure db size and RSS during/after build.

Expected
INT8 should materially reduce disk and memory vs. FP32.

Actual
Disk increases and RSS remains high.

Possible causes

  • Vectors still serialized as float somewhere (graph or doc fetch).
  • Duplication persists despite quantization (graph + bucket both storing vectors).

Help
I can investigate storage paths and submit a fix once root cause is confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions