Replies: 3 comments 4 replies
-
| We have been running ARC on GKE with GHES for about six months now. Recently we enabled Dependabot and those runs were taking forever due to this running directly from a container which is kinda large (1.5+ GiB if I remember correctly - lots of pulling from ghcr.io with stall-outs and I believe some rate limiting.) I migrated those to a RunnerSet following the ARC documentation as you've linked to cache these images and eliminate our ghcr.io Dependabot image pull woes. I haven't directly seen this kind of behavior you are describing, yet. We are running GKE  Also - what storage provisioner are you using? I PoC'd this out with  | 
Beta Was this translation helpful? Give feedback.
-
| thanks for digging out the logs. i didn't find any improvement from moving to a single zone. tbh i've more questions than answers. i found that you can get more information about the volume attach / mount process by sshing onto the node and running  
 
 
 notice the time difference between the 3 logs. i’ve tried following the advice in the link above (fsGroupChangePolicy: "OnRootMismatch") but it didn’t improve the performance either… | 
Beta Was this translation helpful? Give feedback.
-
| 🕒 Discussion Activity Reminder 🕒 This Discussion has been labeled as dormant by an automated system for having no activity in the last 60 days. Please consider one the following actions: 1️⃣ Close as Out of Date: If the topic is no longer relevant, close the Discussion as  2️⃣ Provide More Information: Share additional details or context — or let the community know if you've found a solution on your own. 3️⃣ Mark a Reply as Answer: If your question has been answered by a reply, mark the most helpful reply as the solution. Note: This dormant notification will only apply to Discussions with the  Thank you for helping bring this Discussion to a resolution! 💬 | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We are deploying ARC on GKE and have followed the guide for docker layer caching. We've noticed that usually when the cluster is "cold", i.e. there's not many jobs running, it can take some time after mounting a PV on the pod before it starts pulling the ARC image (45s in this case):
ARC related logs for this pod show a lot of logs like:
runnerpersistentvolumeclaim Retrying sync until statefulset gets removedIt seems like this loop is affecting the speed at which the pod picks up the job.
We have
terminationGracePeriodSeconds: 600on RunnerSet and alsoscaleDownDelaySecondsAfterScaleOut: 600on the HRA. I'm not sure if that could be affecting it...Has anybody experienced anything similar or does anyone have any ideas for how to mitigate?
Beta Was this translation helpful? Give feedback.
All reactions