Skip to content

[YSQL] Support Distinct Bitmap Scans #21039

@timothy-e

Description

@timothy-e

Jira Link: DB-10008

Description

Schema:

CREATE TABLE uniq(r1 INT, r2 INT, r3 INT, PRIMARY KEY(r1 ASC, r2 ASC));
CREATE INDEX idx on uniq(r3 ASC);
INSERT INTO uniq (SELECT 1, i FROM GENERATE_SERIES(1, 10) AS i);
INSERT INTO uniq (SELECT 2, i FROM GENERATE_SERIES(1, 10) AS i);
INSERT INTO uniq (SELECT 3, i FROM GENERATE_SERIES(1, 10) AS i);

To answer this query, we use a bitmap scan:

EXPLAIN (COSTS OFF) /*+Set(enable_hashagg false)*/ SELECT DISTINCT r1 FROM uniq WHERE r1 <= 5 OR r2 <= 5;
                       QUERY PLAN
--------------------------------------------------------
 Unique
   ->  Sort
         Sort Key: r1
         ->  YB Bitmap Table Scan on uniq
             Storage Table Rows Scanned: 30
               ->  BitmapOr
                     ->  Bitmap Index Scan on uniq_pkey
                           Index Cond: (r1 <= 5)
                           Storage Table Rows Scanned: 30
                     ->  Bitmap Index Scan on uniq_pkey
                           Index Cond: (r2 <= 5)
                           Storage Table Rows Scanned: 15
(9 rows)

If we break out the conditions, we only have to select 1 row per tablet with SELECT DISTINCT:

EXPLAIN ANALYZE /*+Set(enable_hashagg false)*/ SELECT DISTINCT r1 FROM uniq WHERE r1 <= 5;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..12.51 rows=82 width=4) (actual time=3.730..3.755 rows=3 loops=1)
   ->  Distinct Index Scan using uniq_pkey on uniq  (cost=0.00..12.51 rows=82 width=4) (actual time=3.722..3.736 rows=3 loops=1)
         Index Cond: (r1 <= 5)
         Distinct Prefix: 1
 Planning Time: 0.344 ms
 Execution Time: 3.883 ms
 Peak Memory Usage: 24 kB
(7 rows)

EXPLAIN ANALYZE /*+Set(enable_hashagg false)*/ SELECT DISTINCT r1 FROM uniq WHERE r2 <= 5;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..12.51 rows=82 width=4) (actual time=5.809..5.837 rows=3 loops=1)
   ->  Distinct Index Scan using uniq_pkey on uniq  (cost=0.00..12.51 rows=82 width=4) (actual time=5.802..5.818 rows=3 loops=1)
         Index Cond: (r2 <= 5)
         Distinct Prefix: 1
 Planning Time: 0.877 ms
 Execution Time: 6.046 ms
 Peak Memory Usage: 24 kB
(7 rows)

If we could support Distinct Index Scans inside Bitmap Operations, we could do this really efficiently:

EXPLAIN (COSTS OFF) /*+Set(enable_hashagg false)*/ SELECT DISTINCT r1 FROM uniq WHERE r1 <= 5 OR r2 <= 5;
                       QUERY PLAN
--------------------------------------------------------
 Unique
   ->  YB Bitmap Table Scan on uniq
       Storage Table Rows Scanned: 3
         ->  BitmapOr
              ->  Bitmap Distinct Index Scan on uniq_pkey
                    Index Cond: (r1 <= 5)
                    Distinct Prefix: 1
                    Storage Table Rows Scanned: 3
              ->  Bitmap Distinct Index Scan on uniq_pkey
                    Index Cond: (r2 <= 5) 
                    Distinct Prefix: 1
                    Storage Table Rows Scanned: 3
(9 rows)

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions