Skip to content

[action] [PR:23375] Moby512 HWSKU cleanup and new FANOUT HWSKU#1457

Merged
r12f merged 1 commit intoAzure:202412from
mssonicbld:cherry/msft-202412/23375
Aug 10, 2025
Merged

[action] [PR:23375] Moby512 HWSKU cleanup and new FANOUT HWSKU#1457
r12f merged 1 commit intoAzure:202412from
mssonicbld:cherry/msft-202412/23375

Conversation

@mssonicbld
Copy link
Copy Markdown
Collaborator

Why I did it

As per the request in https://github.com/aristanetworks/sonic-qual.msft/issues/683, adding new HWSKU Arista-7060X6-16PE-384C-B-O128S2-FANOUT for Moby512.

This PR also contains couple cleanups in the existing Moby512 HWSKUs

  • Arista-7060X6-16PE-384C-B-O128S2-COPPER-LAB is removed as it's the same as Arista-7060X6-16PE-384C-B-O128S2-LAB and additional dynamic autoneg configs
  • Fix the symlinks of Arista-7060X6-16PE-384C-B-O128S2-LAB to fix the current orchagent crash when loading this HWSKU
Work item tracking
  • Microsoft ADO (number only):

How I did it

Use Arista-7060X6-16PE-384C-B-O128S2 as a baseline for Moby512 configuration and make corresponding lanemap and polarity changes for -LAB and -FANOUT

How to verify it

Tested internaly on our Moby512 setup

  • FANOUT: test with two Moby512 switches setup with one loaded Arista-7060X6-16PE-384C-B-O128S2 and another loaded Arista-7060X6-16PE-384C-B-O128S2-FANOUT, and confirmed backplane ports are up
  • LAB: test with a single Moby512 with snake setup between top and bottom backplane ports and confirmed they are up

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • msft-202412
  • msft-202503

Tested branch (Please provide the tested image version)

  • msft-202412

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it

As per the request in https://github.com/aristanetworks/sonic-qual.msft/issues/683, adding new HWSKU `Arista-7060X6-16PE-384C-B-O128S2-FANOUT` for Moby512.

This PR also contains couple cleanups in the existing Moby512 HWSKUs
- `Arista-7060X6-16PE-384C-B-O128S2-COPPER-LAB` is removed as it's the same as `Arista-7060X6-16PE-384C-B-O128S2-LAB` and additional dynamic autoneg configs
- Fix the symlinks of `Arista-7060X6-16PE-384C-B-O128S2-LAB` to fix the current orchagent crash when loading this HWSKU

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

Use `Arista-7060X6-16PE-384C-B-O128S2` as a baseline for Moby512 configuration and make corresponding lanemap and polarity changes for `-LAB` and `-FANOUT`

#### How to verify it

Tested internaly on our Moby512 setup
- FANOUT: test with two Moby512 switches setup with one loaded `Arista-7060X6-16PE-384C-B-O128S2` and another loaded `Arista-7060X6-16PE-384C-B-O128S2-FANOUT`, and confirmed backplane ports are up
- LAB: test with a single Moby512 with snake setup between top and bottom backplane ports and confirmed they are up

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 202205
- [ ] 202211
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [x] 202505
- [x] msft-202412
- [x] msft-202503

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [x] msft-202412

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
@mssonicbld
Copy link
Copy Markdown
Collaborator Author

Original PR: sonic-net/sonic-buildimage#23375

@mssonicbld
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@r12f
Copy link
Copy Markdown

r12f commented Aug 7, 2025

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@r12f r12f merged commit eded563 into Azure:202412 Aug 10, 2025
14 of 17 checks passed
saiarcot895 pushed a commit to saiarcot895/sonic-buildimage-msft that referenced this pull request Aug 14, 2025
…tically (#22227)

src/sonic-sairedis

* 4048483e - (HEAD -> 202411, origin/202411) Revert "Optimize counter polling interval by making it more accurate (Azure#1457)
ccroy-arista pushed a commit to ccroy-arista/sonic-buildimage-msft that referenced this pull request Aug 20, 2025
…D automatically (Azure#1016)

#### Why I did it
src/sonic-sairedis
```
* 86d1413 - (HEAD -> 202412, origin/HEAD, origin/202412) Merge pull request Azure#45 from r12f/code-sync-202412 (31 minutes ago) [Riff]
* 0fcc968 - Merge remote-tracking branch 'base/202411' into code-sync-202412 (13 hours ago) [r12f]
* 4048483 - Revert "Optimize counter polling interval by making it more accurate (Azure#1457) …" (Azure#1570) (2 weeks ago) [Kumaresh Perumal]
* 420d92f - Update build_and_install_module.sh to match newer Linux kernel version (Azure#1561) (4 weeks ago) [mssonicbld]
* e2d2ca6 - [vslib] SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED, SAI_PORT_ATTR_HOST_TX_READY_STATUS support (Azure#1553) (5 weeks ago) [mssonicbld]
* 8c17d4b - Revert "Do not enter vendor SAI critical section for counter polling/clearing operations (Azure#1450)" (Azure#1541) (7 weeks ago) [mssonicbld]
* 3df03e1 - Optimize counter polling interval by making it more accurate (Azure#1457) (Azure#1534) (7 weeks ago) [Stephen Sun]
* d884ff9 - [syncd] Move logSet logGet under mutex to prevent race condition (Azure#1520) (Azure#1538) (8 weeks ago) [Kamil Cudnik]
* ec8b3c3 - Fix pipeline errors related to rsyslogd and libswsscommon installation (Azure#1535) (8 weeks ago) [mssonicbld]
* 6b263b8 - [FC] Support Policer Counter (Azure#1533) (8 weeks ago) [mssonicbld]
* e53489e - [syncd] Update log level for bulk api (Azure#1532) (8 weeks ago) [Jianyue Wu]
* 7ae00e5 - Define bulk chunk size and bulk chunk size per counter ID (Azure#1528) (9 weeks ago) [mssonicbld]
* f35e743 - [nvidia] Skip SAI discovery on ports (Azure#1524) (2 months ago) [mssonicbld]
* bf049ed - Use sonictest pool instead of sonic-common and fix arm64 issue. (Azure#1516) (2 months ago) [mssonicbld]
* ffe371d - [syncd] Support bulk set in INIT_VIEW mode (Azure#1517) (2 months ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
mssonicbld added a commit to mssonicbld/sonic-buildimage-msft that referenced this pull request Mar 21, 2026
…#26294)

#### Why I did it

When lldpd starts (or restarts), it sends the first LLDP frames with **MAC addresses as Port IDs** instead of the configured interface aliases (e.g., `Ethernet1/1`). This is because:

1. lldpd starts in paused state and loads its config file (`/etc/lldpd.conf`)
2. The config file only configures the management port (`eth0`) portidsubtype — no front-panel port configs exist
3. After processing all config lines, lldpd internally **auto-resumes** (hardcoded behavior in lldpd's internal lldpcli)
4. The first LLDP frames are sent with default MAC-based Port IDs
5. lldpmgrd starts 2-3 seconds later and reconfigures each port with the correct alias via `lldpcli`
6. This triggers an **MSAP change** (shutdown frame + new frame with correct Port ID) on every port

Peers see transient neighbor flapping: a MAC-based entry appears briefly, then gets replaced by the correct interface name entry. This can trigger monitoring alerts and confuse network management systems (e.g., LLDP-based topology discovery, automated cabling validation).

Related issues:
- Fixes Azure#1488
- Fixes Azure#1457

##### Work item tracking
- Microsoft ADO **37084792**:

#### How I did it

Added port ID subtype configuration for **all front-panel ports** directly in the `lldpd.conf.j2` Jinja2 template. The template iterates over the `PORT` table from ConfigDB and generates `configure ports <name> lldp portidsubtype local <alias>` lines for every port that has an alias defined.

These configuration lines are processed by lldpd during startup config loading, **before** the internal auto-resume fires. This ensures the very first LLDP frame already carries the correct interface alias as Port ID, eliminating the transient MAC-based Port ID window entirely.

The change is additive — lldpmgrd continues to handle dynamic port configuration changes at runtime. When lldpmgrd later processes the same ports, the portidsubtype is already correct, so no MSAP change occurs (only the port description gets added, which is expected).

**Key technical findings from investigation:**
- lldpd starts paused by default (the `pause` directive in config is actually redundant)
- lldpd's internal lldpcli auto-resumes after ALL config file lines are processed — this cannot be prevented via config
- The `resume` call in `waitfor_lldp_ready.sh` (from PR #5493) fires after auto-resume, so it's also redundant
- The `PORT` variable is available in sonic-cfggen template context from ConfigDB's PORT table

#### How to verify it

**Tested on:** Arista-7260CX3-C64 running SONiC.20251110.15 (64 front-panel ports)

**Test procedure:**
1. Applied the template change inside the lldp container
2. Verified template rendering: `sonic-cfggen -d -t /usr/share/sonic/templates/lldpd.conf.j2` — confirmed 67 portidsubtype lines (1 eth0 + 66 Ethernet ports)
3. Started tcpdump on Ethernet0: `tcpdump -i Ethernet0 -e ether proto 0x88cc -v`
4. Restarted lldp service: `systemctl restart lldp`
5. Captured the **very first** LLDP frame from the DUT

**tcpdump results — first frame after restart:**
```
07:38:41.610005 LLDP, length 233
 Chassis ID TLV (1), length 7
 Subtype MAC address (4): ec:8a:48:3c:e4:a8
 Port ID TLV (2), length 12
 Subtype Local (7): Ethernet1/1 <-- CORRECT from first frame!
 Time to Live TLV (3), length 2: TTL 120s
 System Name TLV (5), length 14: bjw-can-7260-8
```

**Before this fix**, the first frame showed:
```
 Port ID TLV (2), length 8
 Subtype MAC address (3): ec:8a:48:3c:e4:a8 <-- MAC address, WRONG
```

**Before:** MAC address as Port ID for 2-3 seconds → MSAP change → correct alias
**After:** Correct alias (`Ethernet1/1`) from the very first LLDP frame → no MSAP change

#### Which release branch to backport (provide reason below if selected)

- [ ] 202305
- [ ] 202311
- [ ] 202405
- [x] 202411
- [x] 202505
- [x] 202511

**Reason:** This race condition affects all SONiC versions using lldpd with the current startup architecture. The transient MAC-based Port ID causes neighbor flapping visible to peers on every LLDP container restart or device boot, which can trigger false alerts in production monitoring systems.

#### Tested branch (Please provide the tested image version)

- [x] SONiC.20251110.15 (master-based, Arista-7260CX3-C64)

#### Description for the changelog

Fix LLDP Port ID showing MAC address instead of interface name during daemon startup by pre-configuring portidsubtype in lldpd.conf.j2 template.

#### Link to config_db schema for YANG module changes

N/A — no ConfigDB or YANG schema changes.

#### A picture of a cute animal (not mandatory but encouraged)

🦔

Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants