Add indexes for quarantined media lookups#19312
Conversation
| -- we should start at ordering 9305, but there's higher conflict if we steal 9306's ordering, so | ||
| -- we'll steal 9304's ordering instead. 9304 is where the applicable columns were added. |
There was a problem hiding this comment.
Where is the 9306 conflict?
Perhaps we should just separate the insert into background_updates so they each have their own 93/05_xxx.sql/93/06_xxx.sql files.
|
|
||
| -- Note: We *probably* should have an index on quarantined_ts, but we're going | ||
| -- to try to defer that to a future migration after seeing the performance impact. | ||
| -- Note: the index is added in 05_add_quarantined_by_index.sql |
There was a problem hiding this comment.
| -- Note: the index is added in 05_add_quarantined_by_index.sql | |
| -- Note: the index is added in 93/05_add_quarantined_by_index.sql |
| columns=[ | ||
| # We include columns in both the WHERE and ORDER BY clauses to make | ||
| # the resulting query a bit more efficient. | ||
| "quarantined_by", |
There was a problem hiding this comment.
I think the where_clause is sufficient for quarantined_by
| # We include columns in both the WHERE and ORDER BY clauses to make | ||
| # the resulting query a bit more efficient. |
There was a problem hiding this comment.
For my own reference, any good sources? Is this just from looking at the query planner?
This could also use more details. Something like:
| # We include columns in both the WHERE and ORDER BY clauses to make | |
| # the resulting query a bit more efficient. | |
| # We include columns in both the WHERE and ORDER BY clauses to make the | |
| # resulting query a bit more efficient (can allow the database to use a | |
| # single index that covers both the filtering and the sorting). |
| # We include columns in both the WHERE and ORDER BY clauses to make | ||
| # the resulting query a bit more efficient. |
There was a problem hiding this comment.
Should this detail be moved next to the query itself?
| # the resulting query a bit more efficient. | ||
| "quarantined_by", | ||
| "quarantined_ts", | ||
| "media_id", |
There was a problem hiding this comment.
Unclear if media_id is really beneficial here. I assume quarantined_ts gets us most of the way there and Postgres can easily figure out the tie-break without the index on this column.
| if local: | ||
| sql = "SELECT '' as media_origin, media_id FROM local_media_repository WHERE quarantined_by IS NOT NULL ORDER BY quarantined_ts, media_id ASC LIMIT ? OFFSET ?" | ||
| else: | ||
| sql = "SELECT media_origin, media_id FROM remote_media_cache WHERE quarantined_by IS NOT NULL ORDER BY quarantined_ts, media_origin, media_id ASC LIMIT ? OFFSET ?" |
There was a problem hiding this comment.
Why do we sort by media_origin? Is media_id not sufficient enough as a tie-break?
|
closing in light of #19351 |
This is intended to overlap with #19308
Lack of index was introduced in #19268
We require an index on these lookups because early
EXPLAINresults on matrix.org indicate that it'd likely destroy performance if we try to get the first page of results. The cost is far too high (>50k minimum).Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.