Skip to content

Migrate from ntpd to Chrony#1852

Merged
saiarcot895 merged 5 commits intosonic-net:masterfrom
saiarcot895:add-chrony
May 7, 2025
Merged

Migrate from ntpd to Chrony#1852
saiarcot895 merged 5 commits intosonic-net:masterfrom
saiarcot895:add-chrony

Conversation

@saiarcot895
Copy link
Contributor

This HLD is to describe the migration from ntpd to Chrony, along with the reasons this is needed and what the changes would be.

Signed-off-by: Saikrishna Arcot <[email protected]>
Signed-off-by: Saikrishna Arcot <[email protected]>
Signed-off-by: Saikrishna Arcot <[email protected]>
### Monitoring

For the purpose of making time synchronization issues more visible, a Monit
check will be added to verify that the time is currently synchronized to one or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if the minimum number of active servers could be defined so that the synchronization is considered correct. For high requirements, 3 servers should be the minimum requirement.

It would also be good if the details of the time synchronization station (e.g. offset, jitter, delay, count) could be monitored from the network. As far as I know there is currently no Prometheus exporter included in SONiC, SNMP would probably be the only option here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first part, while specifying 3 servers is required for a high level of time accuracy, I'd rather not make it a minimum requirement within SONiC, only because if it is used in an environment that either doesn't need a high level of time accuracy or doesn't have 3 servers available, then that requirement won't be met.

For the second part, I agree on exposing the metrics somehow. I think the current standard we have is to publish the data into STATE_DB, which should make it easier to get exported elsewhere. However, this would involve having (at minimum) another daemon polling chrony frequently, and publishing the data into STATE_DB, and the scope of this effort is already large enough. Can you open an enhancement request for this?

Copy link

@scoopex scoopex Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the minimum servers required: I fully agree that there are environments where having accurate time is not that important. My recommendation was to make the minimum numbers of servers "defineable", then its up to the operator what is needed.

Regarding metrics topic:
Yes of course, see #1857

Copy link
Contributor Author

@saiarcot895 saiarcot895 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add show command for showing hardware clock sync status


Examples:

```
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure Chrony works with mgmt VRF as well? Please add examples with mgmt VRF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chrony running in mgmt vrf:

admin@vlab-01:~$ systemctl status chrony
● chrony.service - chrony, an NTP client/server
     Loaded: loaded (/lib/systemd/system/chrony.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/chrony.service.d
             └─override.conf
     Active: active (running) since Fri 2025-01-24 01:25:41 UTC; 18h ago
       Docs: man:chronyd(8)
             man:chronyc(1)
             man:chrony.conf(5)
    Process: 61323 ExecStartPre=/usr/bin/chrony-config.sh (code=exited, status=0/SUCCESS)
    Process: 61327 ExecStart=/usr/local/sbin/chronyd-starter.sh (code=exited, status=0/SUCCESS)
   Main PID: 61332 (chronyd)
      Tasks: 2 (limit: 4570)
     Memory: 1.5M
     CGroup: /system.slice/chrony.service
             └─vrf
               └─mgmt
                 ├─61332 /usr/sbin/chronyd -F 1
                 └─61333 /usr/sbin/chronyd -F 1

With the following config blocks:

    "MGMT_VRF_CONFIG": {
            "vrf_global": {
                    "mgmtVrfEnabled": "true"
            }
    },
    "NTP": {
        "global": {
            "admin_state": "enabled",
            "authentication": "disabled",
            "dhcp": "enabled",
            "server_role": "disabled",
            "src_intf": "eth0",
            "vrf": "mgmt"
        }
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link

@ben-twn1 ben-twn1 Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @saiarcot895 and @venkatmahalingam,

After configuring the mgmt VRF, we were able to observe the same information as described above. However, we are unable to synchronize time with the NTP server.

admin@sonic:~$ systemctl status chrony
● chrony.service - chrony, an NTP client/server
     Loaded: loaded (/lib/systemd/system/chrony.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/chrony.service.d
             └─override.conf
     Active: active (running) since Mon 2025-10-27 09:23:43 UTC; 12s ago
       Docs: man:chronyd(8)
             man:chronyc(1)
             man:chrony.conf(5)
    Process: 25888 ExecStartPre=/usr/bin/chrony-config.sh (code=exited, status=0/SUCCESS)
    Process: 25944 ExecStart=/usr/local/sbin/chronyd-starter.sh (code=exited, status=0/SUCCESS)
   Main PID: 26001 (chronyd)
      Tasks: 2 (limit: 18567)
     Memory: 1.3M
     CGroup: /system.slice/chrony.service
             └─vrf
               └─mgmt
                 ├─26001 /usr/sbin/chronyd -F 1
                 └─26002 /usr/sbin/chronyd -F 1

We have verified that eth0 is correctly bound to the mgmt VRF using show mgmt-vrf and that it is configured in CONFIG_DB. We can also reach the NTP server via sudo ip vrf exec mgmt ping <NTP_Server_IP>. However, the system cannot synchronize time with the NTP server, as no NTP packets are being sent.

admin@sonic:~$ show mgmt-vrf

ManagementVRF : Enabled

Management VRF interfaces in Linux:
47: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ba:6d:31:a2:79:7a brd ff:ff:ff:ff:ff:ff promiscuity 0  allmulti 0 minmtu 1280 maxmtu 65575
    vrf table 5000 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master mgmt state UP mode DEFAULT group default qlen 1000
    link/ether e8:c7:cf:b1:36:a2 brd ff:ff:ff:ff:ff:ff
48: lo-m: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master mgmt state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether be:26:35:6f:a7:5f brd ff:ff:ff:ff:ff:ff
admin@sonic:~$ redis-cli -n 4 hgetall "NTP|global"
 1) "admin_state"
 2) "enabled"
 3) "authentication"
 4) "disabled"
 5) "dhcp"
 6) "enabled"
 7) "server_role"
 8) "disabled"
 9) "src_intf"
10) "eth0"
11) "vrf"
12) "mgmt"

Additionally, based on our testing, we found that NTP synchronization works as expected when we modify the /etc/chrony/chrony.conf file in any of the following ways (with all other configurations unchanged):

  • Change bindacqdevice eth0 to bindacqaddress <eth0_IP>
  • Change bindacqdevice eth0 to binddevice eth0
  • Remove any binding configuration (i.e., delete the line bindacqdevice eth0)

Could you please provide us with any suggestions or insights regarding the issue we are experiencing?

Thank you very much for your help! Looking forward to your response.

@saiarcot895 saiarcot895 self-assigned this Jan 8, 2025
yxieca pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Mar 28, 2025
This replaces ntpd with Chrony, as described in sonic-net/SONiC#1852. The advantages of this (among others) is control over enabling/disabling long jumps/steps and guaranteed updates of the real time clock.

This PR also includes a submodule update of sonic-utilities, to bring in necessary changes there for chrony to work. The changelog for sonic-utilities is:

Submodule src/sonic-utilities ce51df2..7cbb2f2:
  > [sfputil] add support for sfputil debug tx-output/rx-output {port} enable/disable  (#3811)
  > Switch to using chrony instead of ntpd (#3574)
  > Added post commands for enabling fifos (#3801)
  > kdump-Remote-SSH-Configurations (#3400)

Signed-off-by: Saikrishna Arcot <[email protected]>
@saiarcot895
Copy link
Contributor Author

The master branch is now using Chrony; all relevant PRs have been merged.

@saiarcot895 saiarcot895 merged commit 84f1cbb into sonic-net:master May 7, 2025
@saiarcot895 saiarcot895 moved this from 🔖 HLD Ready for Review to ✅ Done in SONiC 202505 Release May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants