-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
Milestone
Description
Due to load-testing of MM from @bisgaard-itis and @wvangeit we had an outage of osparc on 26-sept 10:20.
This was due to celery-worker containers maximizing their CPU, and completely maxing the machine's CPU
Learnings:
- It is ok if celery-workers are slowed / cpu-limited, but their downtime or slowness should never impact or spill over to core platform services. By either setting their CPU (container) limits very tight, or placing them on dedicated celery-worker machines, this can be achieved
CC @YuryHrytsuk for HA
please comment Mads and werner in case you have more pieces of info for this, or oppinions