Skip to content

SQL: Index can return too many entries when used with ORDER BY #1760

@gramian

Description

@gramian

ArcadeDB Version:

ArcadeDB Server v24.11.1-SNAPSHOT (build 4207821d055e37cbd74b76683e45073da759c2de/1729083296142/console)

OS and JDK Version:

Running on Mac OS X 12.7.6 - OpenJDK 64-Bit Server VM 17.0.12 (Homebrew)

Querying a database can return the wrong number of records, in particular too many records. This problem is not easy to reproduce so I attach a small database (75 records) generated from public data. This behavior occurs when from certain databases, such as the one provided below, is queried via SQL, ie SELECT, where an (not-unique) indexed property is used in the projection and also as ordering quantity (in ORDER BY). The schema is given here: https://github.com/ulbmuenster/dataasee/blob/main/database/schema.sql

Expected behavior

Correct number of records returned.

Actual behavior

Too many records are returned.

Steps to reproduce

Restore this backup: metadatalake-backup-20240829-100753793.zip (337KB)

SELECT count(*) FROM metadata; -- There are 75 records in the database

SELECT name, publicationYear FROM metadata; -- Returns 75 records correctly

SELECT name FROM metadata ORDER BY publicationYear; -- Returns 75 records correctly

SELECT name, publicationYear FROM metadata ORDER BY publicationYear; -- Returns 117 records INCORRECTLY

Notes

End of August (2024-08-28 -- 2024-08-31) @lvca (and I) looked into this and potential sources of this behavior could be:

  • FetchFromIndexStep
  • The UPSERT entering the data.

Here is a sample UPSERT statement that generated the provided database:

UPDATE metadata MERGE {"creators":[{"@type":"pair","name":"Neumann, Georg"},{"@type":"pair","name":"Boivin, Odette"},{"@type":"pair","name":"Kleber, Kristin"},{"@type":"pair","name":"Neumann, Georg"},{"@type":"pair","name":"Boivin, Odette"},{"@type":"pair","name":"Kleber, Kristin"}],"dataLocation":"https://data-management.uni-muenster.de/direct-access/wwurdm/07988641231","dataSteward":"https://datastore.uni-muenster.de","description":"The dataset comprises the RTI-data of the cuneiform tablet VAT 17966 (N5:27).\\n\\nRTI = Reflectance Transformation Imaging (RTI) is a computational photographic method that captures a subject’s surface shape and color and enables the interactive re-lighting of the subject from any direction in a software viewer, revealing details not visible with the naked eye (cf. https://culturalheritageimaging.org/).","identifiers":[{"@type":"pair","data":"10.17879/07988641225","name":"DE-6"},{"@type":"pair","data":"10.17879/07988641225","name":"null"}],"keywords":"Babylon,Cuneiform,Neo-Babylonian","language":"#4:0","metadataQuality":"Incomplete","name":"VAT 17966 - RTI dataset","publicationYear":2024,"publisher":"Universität Münster","rawChecksum":"TcrJlnTQYfugD2rROs2Y3g==","rawType":"marc21","recordId":"MTA0MzQ4NDE4NDMwODAyNDg3NA","resourceType":null,"rights":"CC BY-NC-SA 4.0","source":"https://datastore.uni-muenster.de/oai","synonyms":[{"@type":"pair","data":"GoviB","name":"Alternative Title"}]} UPSERT WHERE recordId == 'MTA0MzQ4NDE4NDMwODAyNDg3NA';

As a workaround the projection can be renamed, ie:

SELECT name, publicationYear AS pubYear FROM metadata ORDER BY publicationYear;

but this also excludes the use of the index!

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions