Skip to content

Conversation

@Amdrel
Copy link
Contributor

@Amdrel Amdrel commented Mar 19, 2025

Description

Added management commands for archiving aggregates.

  • Added 'aggregate_archive_command' helper class that can be used for making archive commands
  • Added 'archive_link_aggregates' command
  • Added 'archive_user_aggregates' command
  • Added 'archive_pageproject_aggregates' command

Rationale

Our aggregates table is causing performance issues due to its large size. By moving old data out of it we hope to improve query performance.

Phabricator Ticket

https://phabricator.wikimedia.org/T370980

How Has This Been Tested?

The archival commands have been manually tested.

Archival Command Examples

python manage.py archive_link_aggregates dump -f 2019-07 -t 2019-08 -c extlinks
python manage.py archive_pageproject_aggregates dump -f 2019-07 -t 2019-08 -c extlinks
python manage.py archive_user_aggregates dump -f 2019-07 -t 2019-08 -c extlinks

The date option specifying the month to archive --from is required. --to is optional if a larger range of months is to be archived.

The Swift container flag --container is optional and allows for uploading of archives to object storage after the archives are generated. Swift must be configured via environment variables for this option to work.

Screenshots of your changes (if appropriate):

N/A

Types of changes

What types of changes does your code introduce? Add an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

@Amdrel Amdrel changed the title WIP: Added management commands for archiving aggregates Added management commands for archiving aggregates and creating totals Mar 27, 2025
@Amdrel Amdrel changed the title Added management commands for archiving aggregates and creating totals Added management commands for archiving aggregates and creating program totals Mar 27, 2025
@Amdrel Amdrel marked this pull request as ready for review March 27, 2025 19:21
@Amdrel Amdrel changed the title Added management commands for archiving aggregates and creating program totals WIP: Added management commands for archiving aggregates and creating program totals Mar 27, 2025
@Amdrel Amdrel marked this pull request as draft March 27, 2025 21:02
@Amdrel Amdrel changed the title WIP: Added management commands for archiving aggregates and creating program totals WIP: Added management commands for archiving aggregates Mar 31, 2025
@Amdrel Amdrel changed the title WIP: Added management commands for archiving aggregates Added management commands for archiving aggregates Apr 4, 2025
@Amdrel Amdrel marked this pull request as ready for review April 4, 2025 19:45
Copy link
Contributor

@katydidnot katydidnot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking like a great start, thanks for all your work.

I had a few comments in the swift.py file, but I think that's still WIP so go ahead and close those out if they're not helpful.

Just noting that we should also handle if the .env file swift variables aren't set as well.

* Added 'aggregate_archive_command' helper class that can be used for
  making archive commands

* Added 'archive_link_aggregates' command

* Added 'archive_user_aggregates' command

* Added 'archive_pageproject_aggregates' command
Copy link
Contributor

@katydidnot katydidnot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review, but wanted to get you what feedback I have so far.

Copy link
Contributor

@katydidnot katydidnot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this locally against a swift docker container and was seeing an exception being thrown. Let me know if I'm missing any other setup.

Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this PR. I just have a minor comment and some nits, but otherwise this looks good!

Copy link
Contributor

@suecarmol suecarmol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work on this PR. I can confirm that this now works on my local environment.

@katydidnot
Copy link
Contributor

Thanks for all your work on this PR. I can confirm that this now works on my local environment.

@suecarmol, @Amdrel

I was able to get the connection working and the logs to state that the files are being uploaded successfully, but when I go to list the container: swift list inside the docker swift container nothing is showing up.

Also, do we need to increase storage capacity of our object storage before merging this in?

@Amdrel
Copy link
Contributor Author

Amdrel commented Jun 6, 2025

I was able to get the connection working and the logs to state that the files are being uploaded successfully, but when I go to list the container: swift list inside the docker swift container nothing is showing up.

I was able to reproduce this if I didn't set the OS_ environment variables. If you didn't set those these are the ones I used and it started working:

export OS_USERNAME=djangouser
export OS_PASSWORD=djangopass
export OS_PROJECT_NAME=wikilink

The swift list command silently fails otherwise. You have to set them in the bash session and not .env for it to work.

@katydidnot
Copy link
Contributor

katydidnot commented Jun 6, 2025

I was able to get the connection working and the logs to state that the files are being uploaded successfully, but when I go to list the container: swift list inside the docker swift container nothing is showing up.

I was able to reproduce this if I didn't set the OS_ environment variables. If you didn't set those these are the ones I used and it started working:

export OS_USERNAME=djangouser
export OS_PASSWORD=djangopass
export OS_PROJECT_NAME=wikilink

The swift list command silently fails otherwise. You have to set them in the bash session and not .env for it to work.

Thanks for the tip, that did it. :)
I'm now seeing the archived aggregate files in the swift container.
Thanks for your work, I can merge this in once we've increased the storage capacity to account for aggregates.

Copy link
Contributor

@katydidnot katydidnot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work!

@katydidnot katydidnot merged commit f86118b into WikipediaLibrary:master Jun 6, 2025
3 checks passed
@suecarmol
Copy link
Contributor

Also, do we need to increase storage capacity of our object storage before merging this in?

I'm sorry for not getting back to you sooner. I had already requested the increase with these backups in mind. Let's monitor the next few days' worth of uploads. I was calculating having 2/3 files per day of aggregates. If we see that my calculations were not accurate, we can ask for a storage increase again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants