Skip to content

Limiting and ordering results #11

@sebscholl

Description

@sebscholl

Problem

Unless I'm missing something, I believe that adding support for limiting and ordering is an important feature. Consider the model:

doc = Document(text="Some text", embedding=[])

Currently if I run a nearest_neighbor search on the doc, it returns all documents per the default ordering in Rails.

puts doc.nearest_neighbors(:embedding, distance: "inner_product").map(&:neighbor_distance)
=> [
  0.7474747,
  0.4638648,
  0.8382633,
  0.9837744,
  0.9237373,
  0.8366281
]

While with a small number of records it's not a problem searching an sorting the results, on larger datasets it becomes a real performance issue.

Solution

What would address this problem (I feel) would be to add limit, order, and threshold options.

# Order results by specified columns
doc.nearest_neighbors(:embedding, distance: "inner_product", order: { neighbor_distance: :desc })

# Only return records with distance score > or < X (gte, gt, lte, lt)
doc.nearest_neighbors(:embedding, distance: "inner_product", threshold: { gte: 0.9 })

# Limit number or records returned from neightbor search
doc.nearest_neighbors(:embedding, distance: "inner_product", limit: 5)

While all these operations can obviously be performed with any returned result in memory, it would be way better to have them happen at the DB level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions