Skip to content

Add indexes for quarantined media lookups#19312

Closed
turt2live wants to merge 2 commits intodevelopfrom
travis/create-quarantined-index
Closed

Add indexes for quarantined media lookups#19312
turt2live wants to merge 2 commits intodevelopfrom
travis/create-quarantined-index

Conversation

@turt2live
Copy link
Copy Markdown
Member

@turt2live turt2live commented Dec 16, 2025

This is intended to overlap with #19308

Lack of index was introduced in #19268

We require an index on these lookups because early EXPLAIN results on matrix.org indicate that it'd likely destroy performance if we try to get the first page of results. The cost is far too high (>50k minimum).

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Comment on lines +15 to +16
-- we should start at ordering 9305, but there's higher conflict if we steal 9306's ordering, so
-- we'll steal 9304's ordering instead. 9304 is where the applicable columns were added.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the 9306 conflict?

Perhaps we should just separate the insert into background_updates so they each have their own 93/05_xxx.sql/93/06_xxx.sql files.


-- Note: We *probably* should have an index on quarantined_ts, but we're going
-- to try to defer that to a future migration after seeing the performance impact.
-- Note: the index is added in 05_add_quarantined_by_index.sql
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- Note: the index is added in 05_add_quarantined_by_index.sql
-- Note: the index is added in 93/05_add_quarantined_by_index.sql

columns=[
# We include columns in both the WHERE and ORDER BY clauses to make
# the resulting query a bit more efficient.
"quarantined_by",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the where_clause is sufficient for quarantined_by

Comment on lines +186 to +187
# We include columns in both the WHERE and ORDER BY clauses to make
# the resulting query a bit more efficient.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own reference, any good sources? Is this just from looking at the query planner?

This could also use more details. Something like:

Suggested change
# We include columns in both the WHERE and ORDER BY clauses to make
# the resulting query a bit more efficient.
# We include columns in both the WHERE and ORDER BY clauses to make the
# resulting query a bit more efficient (can allow the database to use a
# single index that covers both the filtering and the sorting).

Comment on lines +186 to +187
# We include columns in both the WHERE and ORDER BY clauses to make
# the resulting query a bit more efficient.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this detail be moved next to the query itself?

# the resulting query a bit more efficient.
"quarantined_by",
"quarantined_ts",
"media_id",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear if media_id is really beneficial here. I assume quarantined_ts gets us most of the way there and Postgres can easily figure out the tie-break without the index on this column.

if local:
sql = "SELECT '' as media_origin, media_id FROM local_media_repository WHERE quarantined_by IS NOT NULL ORDER BY quarantined_ts, media_id ASC LIMIT ? OFFSET ?"
else:
sql = "SELECT media_origin, media_id FROM remote_media_cache WHERE quarantined_by IS NOT NULL ORDER BY quarantined_ts, media_origin, media_id ASC LIMIT ? OFFSET ?"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we sort by media_origin? Is media_id not sufficient enough as a tie-break?

@turt2live
Copy link
Copy Markdown
Member Author

closing in light of #19351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants