-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Describe the bug
It seems like across 3 worker instances, they seem to not be efficiently executing tasks? I've observed from my postgres database a bunch of connections executing what seems to be the same delete statement.
This seems like really inefficient behavior? It seems like multiple "remove expired objects" tasks are in the task queue and you can apparently get 3 workers picking up the same job, which is really inefficient.
And I'm pretty sure this is a complete waste of my self hosted compute resources since the task queue is slowly increasing while it looks like 3 workers are contenting with each other to delete the same things.
How to reproduce
As far as I'm aware, it's just my installation with multiple workers, and admittedly, not a very fast database.
I've just noticed when checking the active connections from authentik in my postgres installation.
Expected behavior
I expect the workers to not assign ALL the workers simultaneously to trying to clean up expired objects. What's the point of running multiple workers when they can get stuck doing the same thing together.
Also, deleting each message row individually is really bad when apparently somehow on my installation it's generating, on average, a thousand messages per minute somehow. (Although a brief look through that table makes it look like it's all identical messages?)
Screenshots
Also look at all of them blocking on each other when I insert my own query to mass-delete expired messages.
Additional context
Also, I've observed authentik (collectively) holding a significant number of idle connections over time in my postgres DB, as such, I actually enabled idle session timeout and idle in transaction session timeout on my database (It was literally kneeling over from authentik holding 140+ connections when I had a 200 connection limit).
Also there are 7,860,119 mesasges in that django_channels_postgres_message table (I just truncated it a few hours ago!), that the 3 workers seem to be clearing very inefficiently. Since all of them seem to be attempting to delete them one at a time.. on the same message across 3 workers.
These instances are all connecting directly to the primary postgres instance, with no connection pooler in between.
For now I've somewhat mitigated it by using pg_cron to run a pair of cleanup queries specifically targeting the django_channels_postgres_message and django_channels_postgres_groupchannel tables to remove expired entries.
Deployment Method
Kubernetes
Version
2026.2.0