Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dockers/docker-orchagent/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ COPY ["files/arp_update", "/usr/bin"]
COPY ["arp_update.conf", "files/arp_update_vars.j2", "/usr/share/sonic/templates/"]
COPY ["ndppd.conf", "/usr/share/sonic/templates/"]
COPY ["enable_counters.py", "/usr/bin"]
COPY ["docker-init.sh", "orchagent.sh", "swssconfig.sh", "buffermgrd.sh", "/usr/bin/"]
COPY ["docker-init.sh", "orchagent.sh", "swssconfig.sh", "buffermgrd.sh", "kernel_arp_restore.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]
Expand Down
44 changes: 44 additions & 0 deletions dockers/docker-orchagent/kernel_arp_restore.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env bash

function wait_for_intf_up {
INTF_NAME=$1

until [[ `ip link show $INTF_NAME | grep 'state UP'` ]]; do
sleep 0.1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too minute. I would suggest wait 1 sec. Also, if its never up, we need a way to exit the script after say 5mts or so.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will look into a way to timeout

done
}

function restore_arp_to_kernel {
ARP_FILE='/arp.json.1'
NUM_ENTRIES=`jq 'length' $ARP_FILE`

for i in $( seq 0 $(($NUM_ENTRIES - 1)) ); do

# For the ith object, get the first key
# 'jq' sorts the keys by default so this should always be
# the 'NEIGH_TABLE' key, not the 'OP' key
KEY=`jq ".[$i] | keys[0]" $ARP_FILE`

# For all 'jq' commands below, use '-r' for raw output
# to prevent double quoting

# For the object associated with the 'NEIGH_TABLE' key
# store the value of the 'neigh' field (the MAC address)
MAC=`jq -r ".[$i][$KEY][\"neigh\"]" $ARP_FILE`

# Split the 'NEIGH_TABLE' key with delimiter ':' and take the
# second item from the result array which is the device name
DEVICE=`echo $KEY | jq -r ". / \":\" | .[1]"`

# Same as for VLAN, but take the third item which is the IP
IP=`echo $KEY | jq -r ". / \":\" | .[2]"`

wait_for_intf_up $DEVICE
ip neigh replace "$IP" dev "$DEVICE" lladdr "$MAC" nud stale
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are restoring IPv6 entries, this should be "ip -6 neigh ..."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought ip neigh by default could accept both IPv4 and IPv6 entries?

done
}

if [[ -f /restore-kernel ]]; then
restore_arp_to_kernel
rm -f /restore-kernel
fi
10 changes: 10 additions & 0 deletions dockers/docker-orchagent/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,16 @@ stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=orchagent:running

[program:kernel_arp_restore]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the different between kernel_arp_restore and restore_neighbors below? it seems the similiar things.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore_neighbors is executed only during warmboot and restoration is by programming and sending arp requests. kernel_arp_restore is only programming to kernel (as dynamic entries) and done for config_reload scenario. Do you prefer we should combine the script?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recall for fast reboot we also restore the arp entries, which script is that? do we restart kernel arp entries for fast reboot?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, for fastboot, we only restore to APP_DB so that orchangent programs those ARP entry to ASIC and the arp_update script will send out requests to resolve arp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like restore_neighbors is used primarily during warmboot. It looks like it does accomplish the same thing, but I am hesitant to change anything warmboot related in case it has other side effects.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why fast_reboot cannot use the this kernel_arp_restore script? why only use this script in config reload scenario?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is fine not to combine with restore_neighbors, but why this arp_restore cannot be combined into swssconfig.sh and also use it for fast reboot?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with placing this in swssconfig.sh is that these kernel entries can only be programmed after the VLAN interface comes up, which happens after swssconfig.sh has exited

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is ok not be combine. but can this be used for fast-reboot as well?

command=/usr/bin/kernel_arp_restore.sh
priority=6
autostart=false
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=swssconfig:exited

[program:restore_neighbors]
command=/usr/bin/restore_neighbors.py
priority=6
Expand Down
21 changes: 21 additions & 0 deletions dockers/docker-orchagent/swssconfig.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,26 @@ function fast_reboot {
esac
}

function config_reload {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename this function as restator_neighbors, and then rename the fast_reboot script as restore default_routes. basically, i think we should try to reuse the code here and not to create duplication.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can refactor this to re-use more existing code, but not by much since we don't want a lot of overlapping behavior between fast-reboot and config-reload here. E.g. for fast-reboot, we want to restore the ARP table to APP_DB as well as the default routes, but for config reload we do not want to restore the default routes, and we only need the ARP table to be restored to the kernel (if we restored to APP_DB here, there was some concern about having these entries be FAILED if the ToR attempted an ARP refresh).

if [[ -f /config-reload-restore ]];
then
if [[ -f /fdb.json ]];
then
swssconfig /fdb.json
mv -f /fdb.json /fdb.json.1
fi

if [[ -f /arp.json ]];
then
swssconfig /arp.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, may be we should not restore the arp.json entries to APP_DB since we are directly adding to kernel. If we add to APP_DB, arp_refresh script kick in and can result in failed entries temporarily.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will fix

mv -f /arp.json /arp.json.1
# Tell kernel_arp_restore.sh that it needs to act
touch /restore-kernel
fi
rm -f /config-reload-restore
fi
}

# Wait until swss.sh in the host system create file swss:/ready
until [[ -e /ready ]]; do
sleep 0.1;
Expand All @@ -38,6 +58,7 @@ rm -f /ready

# Restore FDB and ARP table ASAP
fast_reboot
config_reload

# read SONiC immutable variables
[ -f /etc/sonic/sonic-environment ] && . /etc/sonic/sonic-environment
Expand Down
7 changes: 7 additions & 0 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,13 @@ function postStartAction()
test -e /host/fast-reboot/default_routes.json && docker cp /host/fast-reboot/default_routes.json swss$DEV:/
rm -fr /host/fast-reboot
fi

if [[ -d /host/config-reload ]] && [[ -f /host/config-reload/needs-restore ]]; then
test -e /host/config-reload/fdb.json && docker cp /host/config-reload/fdb.json swss$DEV:/
test -e /host/config-reload/arp.json && docker cp /host/config-reload/arp.json swss$DEV:/
docker exec swss$DEV touch /config-reload-restore
rm -fr /host/config-reload
fi
docker exec swss$DEV touch /ready # signal swssconfig.sh to go
{%- elif docker_container_name == "pmon" %}

Expand Down