[fast-reboot] Add a check for warmstart before cleaning up neigh table#1498
[fast-reboot] Add a check for warmstart before cleaning up neigh table#1498qiluo-msft merged 1 commit intosonic-net:masterfrom
Conversation
…s enable. This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig after fast-reboot is cleared by neighsyncd. Signed-off-by: bingwang <[email protected]>
| psTable->clear(); | ||
| if (m_warmStartInProgress) | ||
| { | ||
| psTable->clear(); |
There was a problem hiding this comment.
Looks like this change will affect both neighsyncd and natsyncd. @lguohan is it OK to not clear the table for natsyncd when the dut is warm-rebooting? If not and to limit the change to limit neighsyncd, we can also check the table name to be neighsyncd
There was a problem hiding this comment.
Wouldn't nat hit same issue if this protection is not there?
There was a problem hiding this comment.
It is not clear if it is required for NAT tables to be cleared unconditionally or not. Needs a bit of digging into how NAT is using this shared class. NAT is using this method for 4 table here
There was a problem hiding this comment.
This is a bug introduced by #1126 where the refactoring of the library to support multiple tables was calling this psTable->clear unconditionally. The way the fix doing here should be correct. NAT tables or any client should not use the library to flush producer state table in non warm-reboot cases.
There was a problem hiding this comment.
btw: The reason we use psTable clear was to make sure the relevant table wouldn’t change in some corner cases after we dumped it to memory, this is required only in warm-reboot/restart case before we dump the table. Also, since the daemon using the library was the producer itself, it was safe to do so. However, this assumption was broken if we use swssconfig to load the table at the same time, which is the case for non warm-reboot cases for arp, nat tables etc. In those cases, the library cleared the requests from swssconfig incorrectly and cause the issues reported.
| psTable->clear(); | ||
| if (m_warmStartInProgress) | ||
| { | ||
| psTable->clear(); |
There was a problem hiding this comment.
This is a bug introduced by #1126 where the refactoring of the library to support multiple tables was calling this psTable->clear unconditionally. The way the fix doing here should be correct. NAT tables or any client should not use the library to flush producer state table in non warm-reboot cases.
…s enable. (#1498) This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig after fast-reboot is cleared by neighsyncd. **What I did** Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580 We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable. **Why I did it** This PR is to fix the issue that arp table is not recovered after fast-reboot. **How I verified it** Verified on Arista-7260, running 201911 image. 1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot``` 2. Issue a fast-reboot 3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
…s enable. (sonic-net#1498) This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig after fast-reboot is cleared by neighsyncd. **What I did** Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580 We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable. **Why I did it** This PR is to fix the issue that arp table is not recovered after fast-reboot. **How I verified it** Verified on Arista-7260, running 201911 image. 1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot``` 2. Issue a fast-reboot 3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
…s enable. (sonic-net#1498) This commit is to address the issue that the NEIGH_TABLE loaded by swssconfig after fast-reboot is cleared by neighsyncd. **What I did** Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580 We found that neighbor table loaded by ```swssconfig``` from ```arp.json``` after ```fast-reboot``` is cleared by ```neighsyncd``` mistakenly at the initial stage. This PR adds a check for ```WarmStart``` before cleaning up, and only do that if ```WarmStart``` is enable. **Why I did it** This PR is to fix the issue that arp table is not recovered after fast-reboot. **How I verified it** Verified on Arista-7260, running 201911 image. 1. Run some test to populate ARP entries on DUT, such as ```test_fast_reboot``` 2. Issue a fast-reboot 3. Verify the ```arp.json``` backed up by ```fast-reboot-dump.py``` is loaded and NEIGH_TABLE is restored.
What I did
Fix sonic-net/sonic-buildimage#5841 and sonic-net/sonic-buildimage#5580
We found that neighbor table loaded by
swssconfigfromarp.jsonafterfast-rebootis cleared byneighsyncdmistakenly at the initial stage. This PR adds a check forWarmStartbefore cleaning up, and only do that ifWarmStartis enable.Why I did it
This PR is to fix the issue that arp table is not recovered after fast-reboot.
How I verified it
Verified on Arista-7260, running 201911 image.
test_fast_rebootarp.jsonbacked up byfast-reboot-dump.pyis loaded and NEIGH_TABLE is restored.Details if related