Skip to content

Conversation

@Amdrel
Copy link
Contributor

@Amdrel Amdrel commented Mar 31, 2025

Description

Added management commands for creating precomputed program totals from aggregate data.

  • Added 'fill_top_organisations_totals' command and 'ProgramTopOrganisationsTotal' table
  • Added 'fill_top_projects_totals' command and 'ProgramTopProjectsTotal' table
  • Added 'fill_top_users_totals' command and 'ProgramTopUsersTotal' table
  • Added cron jobs that run the totals commands daily after monthly aggregates are run

Rationale

Querying program data from our aggregates table is causing performance issues. We plan on archiving this data to help with performance, but for old program statistics to remain accurate we need to compute program totals for previous months that will be archived. As a byproduct this also improves program query performance substantially.

Phabricator Ticket

https://phabricator.wikimedia.org/T370980

How Has This Been Tested?

Existing programs tests cover the accuracy of the new program totals command output.

Totals Command Examples

Totals function can be tested like so:

python manage.py fill_top_organisations_totals -d 2025-02
python manage.py fill_top_projects_totals -d 2025-02
python manage.py fill_top_users_totals -d 2025-02

If no date option is passed then the earliest date possible is identified and totals are calculated for the whole dataset. The top organisations and top projects caclulate quickly, but top users can take ~10 minutes for the entire dataset.

Screenshots of your changes (if appropriate):

N/A

Types of changes

What types of changes does your code introduce? Add an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

@Amdrel Amdrel force-pushed the T370980-program-totals branch 2 times, most recently from 2af10cf to 49d491b Compare April 1, 2025 00:09
@Amdrel Amdrel changed the title WIP: Added program totals management scripts and tables Added program totals management scripts and tables Apr 4, 2025
@Amdrel Amdrel marked this pull request as ready for review April 4, 2025 19:45
@Amdrel Amdrel force-pushed the T370980-program-totals branch from 3b87d46 to ac17618 Compare May 6, 2025 22:21
Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of the work on this PR. The UI now loads lightning-fast thanks to your changes 😄

I have a few minor comments and nits before we can merge this in.

Noting that the tests for the management commands will be added in PR #439, but that I tested the commands locally and they are working.

Amdrel added 3 commits June 11, 2025 09:39
* Program totals and statistics now use the new tables

* Program CSV downloads now use the new tables
@Amdrel Amdrel force-pushed the T370980-program-totals branch from 6af659a to 9912e28 Compare June 11, 2025 16:55
Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thank you so much for your efforts on making Wikilink load faster!

@katydidnot katydidnot merged commit 18d1233 into WikipediaLibrary:master Jun 25, 2025
3 checks passed
@katydidnot
Copy link
Contributor

Thanks for your work, I validated that this works as expected locally with the exception that program stats will be unavailable until we start backfilling the new database tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants