fix: speed up provisioning even more by 10s with localdns enabled#8338
Conversation
awesomenix
commented
Apr 17, 2026
- Configure skipping WAagent hold for scriptless case.
- Capture localdns logs for debugging
- prewarm coredns so its faster by 6s
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux node provisioning time in scriptless scenarios (with LocalDNS) by skipping walinuxagent hold work, capturing LocalDNS logs for easier debugging, and pre-warming CoreDNS to reduce startup latency.
Changes:
- Expose a new template helper (
GetSkipWaAgentHold) and plumbSKIP_WAAGENT_HOLDinto the CSE command environment. - Pre-warm the LocalDNS CoreDNS binary during
basePrepwhen LocalDNS is enabled. - Extend e2e VM log collection to include
localdnssystemd unit logs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/agent/baker.go | Adds GetSkipWaAgentHold template func to drive SKIP_WAAGENT_HOLD in rendered scripts. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Adds CoreDNS version invocation to pre-warm when LocalDNS is enabled. |
| parts/linux/cloud-init/artifacts/cse_cmd.sh | Exports SKIP_WAAGENT_HOLD into the CSE execution environment. |
| e2e/vmss.go | Collects journalctl -u localdns into test failure artifacts. |
| CSE_TIMEOUT="{{GetCSETimeout}}" | ||
| SKIP_WAAGENT_HOLD="{{GetSkipWaAgentHold}}" | ||
| /usr/bin/nohup /bin/bash -c "/bin/bash /opt/azure/containers/provision_start.sh" |
There was a problem hiding this comment.
Setting SKIP_WAAGENT_HOLD to true causes cse_main.sh to skip unholding walinuxagent at the end of provisioning. However, Ubuntu's installDeps (provision_installs_distro.sh) still unconditionally runs aptmarkWALinuxAgent hold, so in scenarios where FULL_INSTALL_REQUIRED=true this can leave walinuxagent permanently held. Consider also guarding the hold inside installDeps (or only skipping unhold when a hold was actually performed).
| systemctl restart systemd-timesyncd | ||
| fi | ||
|
|
||
| # pre-warm coredns by checking its version. |
There was a problem hiding this comment.
I'm a bit puzzled, do you know what this is doing in the background ? how is this supposed to accelerate/get ride of the 6sec delay you saw ?
There was a problem hiding this comment.
it just reading the binary from azure storage into memory or just paging into memory, considering 6s slowness my guess is the hydration of the binary itself.
|
|
||
| # pre-warm coredns by checking its version. | ||
| if [ "${SHOULD_ENABLE_LOCALDNS}" = "true" ]; then | ||
| nohup /bin/sh -c '/opt/azure/containers/localdns/binary/coredns --version >/dev/null 2>&1' >/dev/null 2>&1 & |
There was a problem hiding this comment.
is there is an error with this command, should we care ? maybe log on a local filesystem ?
There was a problem hiding this comment.
dont care, since we are just using it for warming up the binary so in this case either warm it from azure storage or just paging it into memory
| LOCALDNS_GENERATED_COREFILE="{{GetGeneratedLocalDNSCoreFile}}" | ||
| PRE_PROVISION_ONLY="{{GetPreProvisionOnly}}" | ||
| CSE_TIMEOUT="{{GetCSETimeout}}" | ||
| SKIP_WAAGENT_HOLD="{{GetSkipWaAgentHold}}" |
There was a problem hiding this comment.
should this be part of another PR ? I'm confused how we can make the GetSkipWaAgentHold work ? isn't this risky ?
There was a problem hiding this comment.
We already disable waagent hold for aks-node-controller, because of phase 2 i just reimplemented in CSE which will be only enabled (or skipped hold) in phase 2
There was a problem hiding this comment.
walinuxagent is already centralized in components.json. I remember weeks ago there was a PR that handled this logic and should have had sufficient test coverage even on lower end VM. (Nishchay can confirm). If this is the case, then it should be safe to set SKIP_WAAGENT_HOLD to true completely in Scriptless phase 2 (EnableScriptlessNBCCSECmd = true)
There was a problem hiding this comment.
i confirmed on A and B series on the lowest end as well