Migrate from ntpd to Chrony#1852
Conversation
Signed-off-by: Saikrishna Arcot <[email protected]>
Signed-off-by: Saikrishna Arcot <[email protected]>
Signed-off-by: Saikrishna Arcot <[email protected]>
Signed-off-by: Saikrishna Arcot <[email protected]>
Signed-off-by: Saikrishna Arcot <[email protected]>
| ### Monitoring | ||
|
|
||
| For the purpose of making time synchronization issues more visible, a Monit | ||
| check will be added to verify that the time is currently synchronized to one or |
There was a problem hiding this comment.
It would be good if the minimum number of active servers could be defined so that the synchronization is considered correct. For high requirements, 3 servers should be the minimum requirement.
It would also be good if the details of the time synchronization station (e.g. offset, jitter, delay, count) could be monitored from the network. As far as I know there is currently no Prometheus exporter included in SONiC, SNMP would probably be the only option here.
There was a problem hiding this comment.
For the first part, while specifying 3 servers is required for a high level of time accuracy, I'd rather not make it a minimum requirement within SONiC, only because if it is used in an environment that either doesn't need a high level of time accuracy or doesn't have 3 servers available, then that requirement won't be met.
For the second part, I agree on exposing the metrics somehow. I think the current standard we have is to publish the data into STATE_DB, which should make it easier to get exported elsewhere. However, this would involve having (at minimum) another daemon polling chrony frequently, and publishing the data into STATE_DB, and the scope of this effort is already large enough. Can you open an enhancement request for this?
There was a problem hiding this comment.
About the minimum servers required: I fully agree that there are environments where having accurate time is not that important. My recommendation was to make the minimum numbers of servers "defineable", then its up to the operator what is needed.
Regarding metrics topic:
Yes of course, see #1857
saiarcot895
left a comment
There was a problem hiding this comment.
Add show command for showing hardware clock sync status
|
|
||
| Examples: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Can you make sure Chrony works with mgmt VRF as well? Please add examples with mgmt VRF.
There was a problem hiding this comment.
Chrony running in mgmt vrf:
admin@vlab-01:~$ systemctl status chrony
● chrony.service - chrony, an NTP client/server
Loaded: loaded (/lib/systemd/system/chrony.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/chrony.service.d
└─override.conf
Active: active (running) since Fri 2025-01-24 01:25:41 UTC; 18h ago
Docs: man:chronyd(8)
man:chronyc(1)
man:chrony.conf(5)
Process: 61323 ExecStartPre=/usr/bin/chrony-config.sh (code=exited, status=0/SUCCESS)
Process: 61327 ExecStart=/usr/local/sbin/chronyd-starter.sh (code=exited, status=0/SUCCESS)
Main PID: 61332 (chronyd)
Tasks: 2 (limit: 4570)
Memory: 1.5M
CGroup: /system.slice/chrony.service
└─vrf
└─mgmt
├─61332 /usr/sbin/chronyd -F 1
└─61333 /usr/sbin/chronyd -F 1
With the following config blocks:
"MGMT_VRF_CONFIG": {
"vrf_global": {
"mgmtVrfEnabled": "true"
}
},
"NTP": {
"global": {
"admin_state": "enabled",
"authentication": "disabled",
"dhcp": "enabled",
"server_role": "disabled",
"src_intf": "eth0",
"vrf": "mgmt"
}
}
There was a problem hiding this comment.
Hi @saiarcot895 and @venkatmahalingam,
After configuring the mgmt VRF, we were able to observe the same information as described above. However, we are unable to synchronize time with the NTP server.
admin@sonic:~$ systemctl status chrony
● chrony.service - chrony, an NTP client/server
Loaded: loaded (/lib/systemd/system/chrony.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/chrony.service.d
└─override.conf
Active: active (running) since Mon 2025-10-27 09:23:43 UTC; 12s ago
Docs: man:chronyd(8)
man:chronyc(1)
man:chrony.conf(5)
Process: 25888 ExecStartPre=/usr/bin/chrony-config.sh (code=exited, status=0/SUCCESS)
Process: 25944 ExecStart=/usr/local/sbin/chronyd-starter.sh (code=exited, status=0/SUCCESS)
Main PID: 26001 (chronyd)
Tasks: 2 (limit: 18567)
Memory: 1.3M
CGroup: /system.slice/chrony.service
└─vrf
└─mgmt
├─26001 /usr/sbin/chronyd -F 1
└─26002 /usr/sbin/chronyd -F 1
We have verified that eth0 is correctly bound to the mgmt VRF using show mgmt-vrf and that it is configured in CONFIG_DB. We can also reach the NTP server via sudo ip vrf exec mgmt ping <NTP_Server_IP>. However, the system cannot synchronize time with the NTP server, as no NTP packets are being sent.
admin@sonic:~$ show mgmt-vrf
ManagementVRF : Enabled
Management VRF interfaces in Linux:
47: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether ba:6d:31:a2:79:7a brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 1280 maxmtu 65575
vrf table 5000 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master mgmt state UP mode DEFAULT group default qlen 1000
link/ether e8:c7:cf:b1:36:a2 brd ff:ff:ff:ff:ff:ff
48: lo-m: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master mgmt state UNKNOWN mode DEFAULT group default qlen 1000
link/ether be:26:35:6f:a7:5f brd ff:ff:ff:ff:ff:ff
admin@sonic:~$ redis-cli -n 4 hgetall "NTP|global"
1) "admin_state"
2) "enabled"
3) "authentication"
4) "disabled"
5) "dhcp"
6) "enabled"
7) "server_role"
8) "disabled"
9) "src_intf"
10) "eth0"
11) "vrf"
12) "mgmt"
Additionally, based on our testing, we found that NTP synchronization works as expected when we modify the /etc/chrony/chrony.conf file in any of the following ways (with all other configurations unchanged):
- Change
bindacqdevice eth0tobindacqaddress <eth0_IP> - Change
bindacqdevice eth0tobinddevice eth0 - Remove any binding configuration (i.e., delete the line
bindacqdevice eth0)
Could you please provide us with any suggestions or insights regarding the issue we are experiencing?
Thank you very much for your help! Looking forward to your response.
This replaces ntpd with Chrony, as described in sonic-net/SONiC#1852. The advantages of this (among others) is control over enabling/disabling long jumps/steps and guaranteed updates of the real time clock. This PR also includes a submodule update of sonic-utilities, to bring in necessary changes there for chrony to work. The changelog for sonic-utilities is: Submodule src/sonic-utilities ce51df2..7cbb2f2: > [sfputil] add support for sfputil debug tx-output/rx-output {port} enable/disable (#3811) > Switch to using chrony instead of ntpd (#3574) > Added post commands for enabling fifos (#3801) > kdump-Remote-SSH-Configurations (#3400) Signed-off-by: Saikrishna Arcot <[email protected]>
|
The master branch is now using Chrony; all relevant PRs have been merged. |
This HLD is to describe the migration from ntpd to Chrony, along with the reasons this is needed and what the changes would be.