Skip to content

zebra: V6 RA not sent anymore after interface up-down-up#18451

Merged
ton31337 merged 2 commits intoFRRouting:masterfrom
soumyar-roy:soumya/fastra
May 21, 2025
Merged

zebra: V6 RA not sent anymore after interface up-down-up#18451
ton31337 merged 2 commits intoFRRouting:masterfrom
soumyar-roy:soumya/fastra

Conversation

@soumyar-roy
Copy link
Contributor

@soumyar-roy soumyar-roy commented Mar 21, 2025

zebra: V6 RA not sent anymore after interface up-down-up

Issue:
Once interface is shutdown, the interface is removed from
wheel timer. Now when the interface is up again, current code
won't add the interface to wheel timer again, so it won't send RA
anymore for that interface

Fix:
Moved wheel_add for interface inside rtadv_start_interface_events
This is more common function which gets triggered for both
RA enable and interface up event

Also on any kind of interface activation event, we try to send
RA as soon as possible. This is to satisfy requirement where
quick RA is needed, especially for some convergence, dependent on
RA.

Testing:
Did ineterface up to down to up
Added debug log for RA, checked it is getting advertised preodically
after when up at up state

show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine.

Signed-off-by: Soumya Roy souroy@nvidia.com

@soumyar-roy soumyar-roy marked this pull request as draft March 21, 2025 20:30
@mjstapp
Copy link
Contributor

mjstapp commented Mar 21, 2025

so ... please add a meaningful title and description?

@soumyar-roy soumyar-roy changed the title Soumya/fastra zebra: send v6 fast RA at faster interval Mar 21, 2025
@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from f4eedc5 to 7095a84 Compare March 21, 2025 21:51
@soumyar-roy
Copy link
Contributor Author

so ... please add a meaningful title and description?

Added now

lib/wheel.c Outdated
list_isempty(wheel->wheel_slot_lists[curr_slot])) {
/* Came to back to same slot and that is empty
* so the wheel is empty, puase it
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the comment indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1)This comment is indented w.r.t if (!wheel->run_forever) {. before running git clang-format >>
((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&

  •           list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<This line is tab indented
    
  •           /* Came to back to same slot and that is empty
    
  •            * so the wheel is empty, stop it
    
  •           */
    
  •           if (!wheel->run_forever) {
    
  •                   wheel_stop(wheel);
    
  •                   if (debug_timer_wheel)
    
  •                           zlog_debug("Stopped an empty  wheel %p", wheel);
    
  •                   return;
    
  •           }
    
  •   }
    

2)After git clang-format >>
if ((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&

  •           list_isempty(wheel->wheel_slot_lists[curr_slot])) {
    
  •       list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<<<This line gets space indented
              /* Came to back to same slot and that is empty
               * so the wheel is empty, stop it
              */
    
  1. If I dont do step 2) I get style suggestion error curl https://gist.githubusercontent.com/polychaeta/13d7c1b3f9c07b87352be22b5f29ad01/raw/55bb8b7724008c107d333a05c8db9a785a2db0f7/style.diff | git apply -. So restoring back to 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no: line 56 is not correct. it looks like it's missing a space to align the comment block

@ton31337
Copy link
Member

Can we have a bit of the context when fast/regular wheels are used?

@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from c13b1c3 to 3a9feb0 Compare March 23, 2025 18:08
@soumyar-roy
Copy link
Contributor Author

Can we have a bit of the context when fast/regular wheels are used?

Added more context

@soumyar-roy soumyar-roy marked this pull request as ready for review March 24, 2025 00:53
@ton31337
Copy link
Member

Thanks, makes sense now, but please put it inside the commit (not in PR).

if (adv_if != NULL)
if (adv_if != NULL) {
rtadv_send_packet(zvrf->rtadv.sock, zif->ifp, RA_ENABLE);
wheel_add_item(zrouter.ra_wheel, zif->ifp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned about this a bit. I only see the "wheel_remove" in the path where the frr-configured "shut" is processed, but not in other interface "down" processing. are you sure this is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added wheel_remove in interface_down()

@soumyar-roy
Copy link
Contributor Author

Test case added.

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interface module is using some rtadv apis. I think the clearest way to fix these problems is to make sure a) that the rtadv module is notified during the relevant interface events, and b) maybe ensure that it's unambiguous what needs to happen within the rtadv code. a flag, for instance, could indicate whether an interface is rtadv-active or not?

@soumyar-roy
Copy link
Contributor Author

the interface module is using some rtadv apis. I think the clearest way to fix these problems is to make sure a) that the rtadv module is notified during the relevant interface events, and b) maybe ensure that it's unambiguous what needs to happen within the rtadv code. a flag, for instance, could indicate whether an interface is rtadv-active or not?

Changed code.

#include "log.h"
#include "zclient.h"
#include "vrf.h"
#include "wheel.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this change, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

zebra/rtadv.c Outdated
wheel_remove_item(zrouter.ra_wheel, ifp);

if (if_down_event) {
/* Nothing to do more, return */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I understand that you don't want to try to send a packet, but shouldn't you stop the "join_timer" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was trying to limit existing behavior part of this change. I have accommodated it now, join_timer is turned off.

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the zebra changes are ok now


# Take two snap shots for RA status, it should not change
# Give enough time, RA adv timer to expire
sleep(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really deterministic (with sleeps)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed hardcoding for sleep, parsed ndRouterAdvertisementsIntervalSecs and used that for sleep time. Please check new code. But still we need to wait/sleep for some amount of time, to get more correct result. We need to wait for at-least ndRouterAdvertisementsIntervalSecs + buffer time(1 sec) to avoid situation where any event was delayed or due to any processing delay, the RA state might get updated later, if we checked early, test may fail to catch a wrong state, it may indicate false PASS case.


_, result = topotest.run_and_expect(_check_interface_down, True, count=10, wait=1)
if result is not True:
sys.stderr.write("Interface did not go down after shutdown command\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this stderr print if we assert below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

output = tgen.gears["r1"].vtysh_cmd("show interface r1-eth200 json")
return True if '"administrativeStatus":"down"' in output else False

_, result = topotest.run_and_expect(_check_interface_down, True, count=10, wait=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set at least 15 seconds (this is a minimum)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Verify RA state didn't change when interface is down
if rtadv_output1 != rtadv_output2:
sys.stderr.write(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we assert here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed code.

@ton31337
Copy link
Member

Why do you close and reopen all the time?

Issue:
Once interface is shutdown, the interface is removed from
wheel timer. Now when the interface is up again, current code
won't add the interface to wheel timer again, so it won't send RA
anymore for that interface

Fix:
Moved wheel_add for interface inside rtadv_start_interface_events
This is more common function which gets triggered for both
RA enable and interface up event

Also on any kind of interface activation event, we try to send
RA as soon as possible. This is to satisfy requirement where
quick RA is needed, especially for some convergence, dependent on
RA.

Testing:
Did ineterface up to down to up
Added debug log for RA, checked it is getting advertised preodically
after when up at up state

show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine.

Signed-off-by: Soumya Roy <souroy@nvidia.com>
Added test cases with interface down/up/shutdown
to verify RA state of an interface

Signed-off-by: Soumya Roy <souroy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libfrr master rebase PR needs rebase size/L tests Topotests, make check, etc zebra

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants