Skip to content

[action] [PR:3630] Enable FDB learning event after all ports removed from default 1Q bridge#79

Merged
mssonicbld merged 1 commit intoAzure:202412from
mssonicbld:cherry/msft-202412/3630
May 9, 2025
Merged

[action] [PR:3630] Enable FDB learning event after all ports removed from default 1Q bridge#79
mssonicbld merged 1 commit intoAzure:202412from
mssonicbld:cherry/msft-202412/3630

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

What I did

This PR is to fix an orchagent crash issue. Error logs are as below.

2025 May  1 05:51:07.128331 str4-7060x6-64pe-7 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x3a0000000000d7 reference count is 1, can't remove
2025 May  1 05:51:07.128331 str4-7060x6-64pe-7 ERR swss#orchagent: :- removeDefaultBridgePorts: Failed to remove bridge port, rv:-17
2025 May  1 05:51:07.128566 str4-7060x6-64pe-7 INFO swss#supervisord: orchagent terminate called after throwing an instance of 'std::runtime_error'
2025 May  1 05:51:07.128566 str4-7060x6-64pe-7 INFO swss#supervisord: orchagent   what():  PortsOrch initialization failure
2025 May  1 05:51:07.815330 str4-7060x6-64pe-7 INFO swss#supervisord 2025-05-01 05:51:07,814 WARN exited: orchagent (terminated by SIGABRT (core dumped); not expected)

It's because FDB is learnt on the default bridge, which increased reference count for bridge port and caused port removal failure.

The issue is addressed by not setting SAI_SWITCH_ATTR_FDB_EVENT_NOTIFY when creating switch, and enable it after all ports being removed from default bridge.

Why I did it
This PR is to fix an orchagent crash issue.

How I verified it

  1. The change is verified on multiple platforms. FDB learning can be done normally after this change.

Broadcom

admin@str4-7060x6-64pe-fan-4:~$ fdbshow
  No.    Vlan  MacAddress         Port         Type
-----  ------  -----------------  -----------  -------
    1    1234  B6:2C:7E:FC:80:00  Ethernet496  Dynamic
    2    1234  D6:5E:2C:C0:B8:0B  Ethernet496  Dynamic
    3    1235  CE:8F:2A:A1:00:01  Ethernet496  Dynamic

Mellanox

admin@str-msn2700-01:~$ fdbshow
  No.    Vlan  MacAddress         Port       Type
-----  ------  -----------------  ---------  -------
    1    1000  7C:FE:90:5E:60:01  Ethernet4  Dynamic
Total number of entries 1

Cisco

admin@str3-8101-03:~$ fdbshow
  No.    Vlan  MacAddress         Port         Type
-----  ------  -----------------  -----------  -------
    1    1000  9C:09:8B:B6:E6:00  Ethernet240  Dynamic
Total number of entries 1
  1. The existing VS test test_fdb.py can cover the change.

Details if related

<!--
Please make sure you have read and understood the contribution guildlines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

1. Make sure your commit includes a signature generted with `git commit -s`
2. Make sure your commit title follows the correct format: [component]: description
3. Make sure your commit message contains enough details about the change and related tests
4. Make sure your pull request adds related reviewers, asignees, labels

Please also provide the following information in this pull request:
-->

**What I did**

This PR is to fix an orchagent crash issue. Error logs are as below.
```
2025 May  1 05:51:07.128331 str4-7060x6-64pe-7 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x3a0000000000d7 reference count is 1, can't remove
2025 May  1 05:51:07.128331 str4-7060x6-64pe-7 ERR swss#orchagent: :- removeDefaultBridgePorts: Failed to remove bridge port, rv:-17
2025 May  1 05:51:07.128566 str4-7060x6-64pe-7 INFO swss#supervisord: orchagent terminate called after throwing an instance of 'std::runtime_error'
2025 May  1 05:51:07.128566 str4-7060x6-64pe-7 INFO swss#supervisord: orchagent   what():  PortsOrch initialization failure
2025 May  1 05:51:07.815330 str4-7060x6-64pe-7 INFO swss#supervisord 2025-05-01 05:51:07,814 WARN exited: orchagent (terminated by SIGABRT (core dumped); not expected)
```
It's because FDB is learnt on the default bridge, which increased reference count for bridge port and caused port removal failure.

The issue is addressed by **not** setting  `SAI_SWITCH_ATTR_FDB_EVENT_NOTIFY` when creating switch, and enable it after all ports being removed from default bridge.

**Why I did it**
This PR is to fix an orchagent crash issue.

**How I verified it**
1. The change is verified on multiple platforms. FDB learning can be done normally after this change.

**Broadcom**
```
admin@str4-7060x6-64pe-fan-4:~$ fdbshow
  No.    Vlan  MacAddress         Port         Type
-----  ------  -----------------  -----------  -------
    1    1234  B6:2C:7E:FC:80:00  Ethernet496  Dynamic
    2    1234  D6:5E:2C:C0:B8:0B  Ethernet496  Dynamic
    3    1235  CE:8F:2A:A1:00:01  Ethernet496  Dynamic
```
**Mellanox**
```
admin@str-msn2700-01:~$ fdbshow
  No.    Vlan  MacAddress         Port       Type
-----  ------  -----------------  ---------  -------
    1    1000  7C:FE:90:5E:60:01  Ethernet4  Dynamic
Total number of entries 1
```

**Cisco**
```
admin@str3-8101-03:~$ fdbshow
  No.    Vlan  MacAddress         Port         Type
-----  ------  -----------------  -----------  -------
    1    1000  9C:09:8B:B6:E6:00  Ethernet240  Dynamic
Total number of entries 1
```

2. The existing VS test `test_fdb.py` can cover the change.

**Details if related**
@mssonicbld mssonicbld requested a review from prsunny as a code owner May 9, 2025 07:34
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: sonic-net/sonic-swss#3630

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit 2daf207 into Azure:202412 May 9, 2025
5 of 8 checks passed
mssonicbld added a commit that referenced this pull request May 26, 2025
```<br>* 32da647 - (HEAD -> 202503) Merge branch '202412' of https://github.com/Azure/sonic-swss.msft into 202503 (2025-05-26) [Sonic Automation]
* 06b16c3 - (base/202412) [fpmsyncd]Fixing blackhole route to publish protocol field to APPL_DB (#83) (2025-05-23) [Sudharsan Dhamal Gopalarathnam]
* b801f2d - [202412] [SRv6] add MySID counters support (#82) (2025-05-19) [Yakiv Huryk]
* a999b4d - Merge pull request #81 from r12f/code-sync-202412 (2025-05-17) [Dashuai Zhang]
|\ 
| failure_prs.log fd87e1f - Merge remote-tracking branch 'base/202411' into code-sync-202412 (2025-05-16) [r12f]
|/| 
| failure_prs.log 623b018 - (origin/202411) [202411] Setting default nexthop weight to 1 in fpmsyncd (2025-05-15) [Kumaresh Perumal]
| |\ 
| | failure_prs.log a99088e - Removed logging code. (2025-05-15) [Mahdi Ramezani]
| | failure_prs.log 5cdc78e - Fixed a compile error. (2025-05-15) [Mahdi Ramezani]
| | failure_prs.log a79b7e0 - Set default nexthop weight to 1. Added unit tests for 'getNextHopWt'. (2025-05-15) [Mahdi Ramezani]
| |/ 
* | 2a0856b - Merge pull request #78 from nazariig/202412-trim-azure (2025-05-14) [Nazarii Hnydyn]
* | 2daf207 - Enable FDB learning event after all ports removed from default 1Q bridge (#79) (2025-05-09) [mssonicbld]
* | 3b70292 - Move timestamps out of counter table to avoid update too frequently (#75) (2025-04-28) [mssonicbld]
* | 3fa0d72 - Merge pull request #74 from mssonicbld/sonicbld/202412-merge (2025-04-23) [mssonicbld]
* | be436da - Merge branch '202411' of https://github.com/sonic-net/sonic-swss into 202412 (2025-04-23) [Sonic Automation]
|/ 
* 79f04e3 - Initialize the last fec ber computed values if not found (#3621) (2025-04-22) [mssonicbld]<br>```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant