Skip to content

[HLD] bmp for monitoring SONiC BGP info#1621

Merged
qiluo-msft merged 6 commits intosonic-net:masterfrom
FengPan-Frank:fenpan_bmp
Dec 7, 2024
Merged

[HLD] bmp for monitoring SONiC BGP info#1621
qiluo-msft merged 6 commits intosonic-net:masterfrom
FengPan-Frank:fenpan_bmp

Conversation

@FengPan-Frank
Copy link
Contributor

@FengPan-Frank FengPan-Frank commented Feb 22, 2024

[HLD] bmp for monitoring SONiC BGP info

Description Repo PR link State
Yang model definition sonic-buildimage sonic-net/sonic-buildimage#18530 GitHub issue/pull request detail
Add BMP_STATE_DB dedicated instance sonic-buildimage sonic-net/sonic-buildimage#19016 GitHub issue/pull request detail
Add BMP_STATE_DB support in swss-common sonic-swss-common sonic-net/sonic-swss-common#879 GitHub issue/pull request detail
bmpcfgd service implementation sonic-buildimage sonic-net/sonic-buildimage#18940 GitHub issue/pull request detail
config cli implementation sonic-utility sonic-net/sonic-utilities#3286 GitHub issue/pull request detail
show cli implementation sonic-utility sonic-net/sonic-utilities#3289 GitHub issue/pull request detail
openbmpd service implementation sonic-bmp sonic-net/sonic-bmp#7 GitHub issue/pull request detail
Add sonic-bmp submodule sonic-buildimage sonic-net/sonic-buildimage#20780 GitHub issue/pull request detail
Build bmp container into sonic-buildimage sonic-buildimage sonic-net/sonic-buildimage#18946 GitHub issue/pull request detail
Add bmp dynamic feature switch in frr side config. sonic-buildimage sonic-net/sonic-buildimage#20895 GitHub issue/pull request detail

@FengPan-Frank FengPan-Frank force-pushed the fenpan_bmp branch 3 times, most recently from 6427fc8 to 23d402e Compare February 27, 2024 12:34
@FengPan-Frank FengPan-Frank force-pushed the fenpan_bmp branch 5 times, most recently from 7edbcda to b81dfcf Compare February 29, 2024 01:26
doc/bmp/bmp.md Outdated
Copy link
Contributor

@qiluo-msft qiluo-msft Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current [BGP] container has been forked from [FRR] -> Current BGP container uses a implementation (sonic-frr) forked from FRR #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@FengPan-Frank FengPan-Frank force-pushed the fenpan_bmp branch 2 times, most recently from 1673a55 to d075afa Compare February 29, 2024 12:33
@FengPan-Frank FengPan-Frank force-pushed the fenpan_bmp branch 5 times, most recently from 6f2ae6c to 68234b1 Compare March 12, 2024 14:41
@zhangyanzhao
Copy link
Collaborator

doc/bmp/bmp.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/sonic-net/SONiC/pull/1621/files#diff-0649e404e5d4496d9d6af8f6b9411cb483e3a6d376a927d8660baf476465d7b3R138
"SONiC to stream events occurring in SONiC switch (e.g BGP flap, process not running, ...) in real time via gNMI subscription model.
The events are defined using versioned YANG schema."

We have the above YANG model for streaming the BGP states today and looks like BGP_NEIGHBOR_TABLE supports extensive state information, hope we'll migrate to use one streaming method i.e BMP or events infra.

doc/bmp/bmp.md Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we new docker for running the "openBMPd" daemon?

I understand that the plan is to have,

  1. config "feature" command to enable/disable this.
  2. Limit CPU resource usage of the new docker .

Even if we leverage the existing frr container to run openBMPd, there are alternate ways to achieve the above, for instance #1 we could have a "config bmp <enable|disable>" which could internally start/stop the openBMPd process. And for #2 we could use command such as "nice/cpulimit" to set process usage limits.

From what i have seen adding a new container (base docker, with no overhead app) increases RAM usage by 20 to 30 MB, so just want to ensure if we really need a new container here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's indeed feasible as either way, and some tradeoff just exists as consideration.
The advantage of using dedicated container is that we can avoid packaging bmp feature onto some slim images if DUT resource is small and expensive relatively.

doc/bmp/bmp.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since FRR does support bmp enable/disable as a cmd-line arg (/usr/lib/frr/bgpd -A 127.0.0.1 -M bmp), to support per table level enable/disable as shown below, it's better to discuss this with FRR-community and introduce table level enable/disable support via FRR CLI knob, so that zebra/bgpd doesn't have to spend any cycle in sending change notifications for those tables to BMP ; Otherwise zebra/bgpd will always send notifications to BMP for the disabled tables and bmp ignores it later based on this config-db.

Please discuss with FRR community on this, as discussed in the HLD review community meeting (March-12-2024)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, currently FRR just post data to BMP following standard BGP protocol, thus neighbor table will be sent in BGP_MSG_OPEN message (#https://datatracker.ietf.org/doc/html/rfc4271#page-13) and all other tables including rib/ls_state... will be sent in BGP_MSG_UPDATE. A better option is to update in FRR side and controlled by the config/cli together, so that no extra data is transmitted between FRR and BMP. Will follow up with FRR community.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Let's discuss with FRR community and enhance FRR capability to achieve this, as part of bringing in this feature to community-sonic.

doc/bmp/bmp.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VTYSH show commands provide extensive dumps today for the debuggability, trying to understand the new information that we support part of openBMP.

On-change subscription for BGP data is clearly an use-case, I would like to see more filtering options (e.g RIB-in or our per neighbor) in the HLD so that we can limit the number entries going into the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking smaller granularity control like per neighbor? instead of per table?.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking per neighbor level filtering at least and if possible, per table level filtering is good as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we will support table level filtering first, later could append neighbor level filtering if specific requirement asks.

doc/bmp/bmp.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option is to connect to the local collector and other option is to connect to the remote collector, if we have an use-case not to store into the redis DB but to only stream into the remote collector, are we planning to give such an option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that will require dedicated BMP service being deployed and maintained, for now we're not planning of this.

doc/bmp/bmp.md Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider pushing this state information into STATE_DB or new DB instance if that makes our life easy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

doc/bmp/bmp.md Outdated
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an exercise done to optimize the DB memory usage for storing the entries in DASH WG. Since we may store lot of RIB information into the DB, may be it's ideal to use protobuf format rather than text format for DB entries (refer the below link for more information).
https://github.com/sonic-net/DASH/blob/main/documentation/general/data/AppDBMemoryEstimation.xlsx

Copy link
Contributor Author

@FengPan-Frank FengPan-Frank Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will make some comparison for this, using pb should be feasible just the data will not be readable if we use redis-cli directly, seems now most of sonic database are text? I'm not sure @qiluo-msft may you comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, most database contents are in text format, we may need to check the possibility of memory optimization as we may push lot of RIB information into the DB.

@FengPan-Frank FengPan-Frank force-pushed the fenpan_bmp branch 2 times, most recently from edbe16c to 16f287f Compare March 14, 2024 14:57
lguohan pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Apr 23, 2024
Why I did it
Add yang model support for feature sonic-net/SONiC#1621

Microsoft ADO (number only):27332639

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about this architecture design:
1)Why not BMP send route to external OpenBMP servers, which may keep more detailed BGP history based on Kafka.
2) Can BMP module in FRR write route to Redis directly? It is more straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1> That's will require us to maintain dedicated OpenBMP service, which is not in current scope, maybe later we can do that, if so it will be configurable simply from bmp perspective as well.
2> I think it will introduce unnecessary complexities into FRR, if like that FRR can persist Rib info directly instead of BMP, right? We choose BMP path now because BMP was natively supported in FRR in ASYNC mode, and not much extra cost added into FRR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Got the concept behind your design. Thanks~

qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Sep 19, 2024
#### Why I did it
sonic-net/SONiC#1621, need dedicated daemon to monitor config_db and manage openbmpd state.
#### How I did it
Use swss-common lib to monitor config_db change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

7 participants