Skip to content

Conversation

@Amdrel
Copy link
Contributor

@Amdrel Amdrel commented Mar 5, 2025

Description

These indexes help increase performance for project 'project_count' and 'editor_count'. The other slow endpoints 'top_projects' and 'top_users' may need an alternative solution such as caching with memcached which is outside the scope of this patch.

Changes

  • Added an index for (organisation, username) to UserAggregate.
  • Added an index for (collection, username) to UserAggregate.
  • Added an index for (organisation, project_name) to PageProjectAggregate.
  • Added an index for (collection, project_name, page_name) to PageProjectAggregate.
  • Improved the performance of a LinkAggregate table check.
    • Checking for the existence of LinkAggregates for a set of organisations was taking multiple seconds because a boolean check was resulting in a large query being run where an .exists() check could be used instead.

Rationale

Bad query performance has been causing issues with some queries, in-particular on the projects pages to the point where requests were timing out before they could complete. This patch adds some indexes that partially alleviates this.

Phabricator Ticket

https://phabricator.wikimedia.org/T370533

How Has This Been Tested?

Manually.

Screenshots of your changes (if appropriate):

N/A

Types of changes

What types of changes does your code introduce? Add an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

@Amdrel
Copy link
Contributor Author

Amdrel commented Mar 6, 2025

I pushed a commit that improves a query's performance without an index change that I found while doing some testing for this task. I can move it into another PR if this one is inappropriate for those kinds of changes.

@Amdrel
Copy link
Contributor Author

Amdrel commented Mar 7, 2025

I think that test that failed is flaky since I can't reproduce that 🤔

@Amdrel Amdrel marked this pull request as ready for review March 11, 2025 22:36
These indexes help increase performance for project 'project_count' and
'editor_count'. The other slow endpoints 'top_projects' and 'top_users'
may need an alternative solution.

* Added an index for (organisation, username) to UserAggregate.

* Added an index for (collection, username) to UserAggregate.

* Added an index for (organisation, project_name) to PageProjectAggregate.

* Added an index for (collection, project_name, page_name) to
  PageProjectAggregate.

* Improved the performance of a LinkAggregate table check. Checking for
  the existence of LinkAggregates for a set of organisations was taking
  multiple seconds because a boolean check was resulting in a large
  query being run where an `.exists()` check could be used instead.

Bug: T370533
@Amdrel
Copy link
Contributor Author

Amdrel commented Mar 12, 2025

This PR is ready to review now

Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the indices! I do see a performance improvement on loading the stats.

@katydidnot katydidnot merged commit b02b804 into WikipediaLibrary:master Mar 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants