Skip to content

Conversation

@spottsdd
Copy link
Contributor

Description

When running in K8s, the pod will spike cpu and memory use that will trigger a pod restart. A link to this issue appears in the logs: RTT warning can signal CPU saturation · sidekiq sidekiq · Discussion #5039

The worker service has a concurrency setting in config/sidekiq.yml set to 5. Despite being lower than the default of 10, the pod will restart.

I found that setting concurrency to 2 and increasing the pod resource limits avoids regular restarts. The cpu and memory still spike but aren’t enough to trigger a restart.

How to test

K8s testing steps are outline in the k8s-manifests/readme. It's important to note that unlike Docker compose, K8s won't build container images. Images must be pre-built and hosted in a registry. For development, you can run a local registry. Then build and push images. The images must be built and pushed on the worker node as that is where the services will run and look for localhost.

The development K8s Sandbox is currently set to test.

On the worker terminal: cd to root/lab/storedog
checkout this branch (git clone runs during track setup)
Run the build command in the k8s readme

On the control-plane terminal: cd to root/lab/storedog
checkout this branch (git clone runs during track setup)
Follow the steps in the readme to setup the Datadog operator and start Storedog.

Watch the pods run: watch kubectl get pods -n storedog
Previously, the worker pod would restart about every 3 minutes. Wait at least 10 minutes. I've let it run for a full hour to confirm.

@spottsdd spottsdd requested review from a team as code owners July 18, 2025 13:55
@arosenkranz arosenkranz merged commit 217c561 into main Jul 18, 2025
1 check passed
@arosenkranz arosenkranz deleted the TRAIN-3394-worker-restarts branch July 18, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants