Feature: support hash index for faster lookup when no range queries are needed

  Why ArcadeDB Needed a Hash Index

  ArcadeDB's existing LSM-Tree index is optimized for range scans and sequential access. However, the most common database operation — point lookups (primary key access, JOIN resolution, edge traversal) —
  doesn't need ordering. LSM pays for it anyway: multiple sorted levels, compaction overhead, and 2-3 page reads with binary searches per level.

  A hash index achieves O(1) lookups with 1-2 page reads, purpose-built for equality queries.

  Benchmark Results (10M records)

  ```
┌───────────────────────────┬─────────────┬─────────────┬─────────────┐
  │         Operation         │  LSM Tree   │ Hash Index  │ Improvement │
  ├───────────────────────────┼─────────────┼─────────────┼─────────────┤
  │ Bulk Insert               │ 18.6K/s     │ 43.8K/s     │ 2.4x faster │
  ├───────────────────────────┼─────────────┼─────────────┼─────────────┤
  │ Point Lookup              │ 16.5K ops/s │ 125K ops/s  │ 7.6x faster │
  ├───────────────────────────┼─────────────┼─────────────┼─────────────┤
  │ SQL Lookup (WHERE id = ?) │ 13.5K ops/s │ 38.5K ops/s │ 2.8x faster │
  ├───────────────────────────┼─────────────┼─────────────┼─────────────┤
  │ Update by Key             │ 5.8K ops/s  │ 13.2K ops/s │ 2.3x faster │
  ├───────────────────────────┼─────────────┼─────────────┼─────────────┤
  │ Delete + Re-insert        │ 3.9K ops/s  │ 4.7K ops/s  │ 1.2x faster │
  └───────────────────────────┴─────────────┴─────────────┴─────────────┘
```

  Design: Extendible Hashing

  The implementation uses extendible hashing — a proven disk-oriented algorithm with 40+ years of production history:

  - No compaction needed — unlike LSM, there are no background merge operations, no write amplification, no throughput degradation at scale
  - Constant-time lookups — hash key → read directory → read bucket page (2 page reads, both cacheable)
  - Local splits — when a bucket is full, only that bucket splits; no global reorganization
  - Space efficient — no tombstones, no duplicate entries across levels

  When to Use Which

```
  ┌───────────────────────────────────────┬───────────────────┐
  │               Use Case                │ Recommended Index │
  ├───────────────────────────────────────┼───────────────────┤
  │ WHERE id = ?, JOINs, edge traversal   │ Hash              │
  ├───────────────────────────────────────┼───────────────────┤
  │ WHERE age > 30, ORDER BY, range scans │ LSM Tree          │
  └───────────────────────────────────────┴───────────────────┘
```

  The two indexes are complementary. Hash is the better default for primary keys and any field accessed by equality, while LSM remains the right choice for range queries and ordered access.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: support hash index for faster lookup when no range queries are needed #3527

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature: support hash index for faster lookup when no range queries are needed #3527

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions