From c07a12a53ee1e959dc755629fc43194c43883926 Mon Sep 17 00:00:00 2001 From: anamehra <54692434+anamehra@users.noreply.github.com> Date: Mon, 27 Jan 2025 11:52:51 -0800 Subject: [PATCH 1/2] Update cisco-8000.ini to 202405.1.1.3 release (#21445) Signed-off-by: Anand Mehra anamehra@cisco.com --- platform/checkout/cisco-8000.ini | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform/checkout/cisco-8000.ini b/platform/checkout/cisco-8000.ini index f3e421be4d..8cbdb2491f 100644 --- a/platform/checkout/cisco-8000.ini +++ b/platform/checkout/cisco-8000.ini @@ -1,3 +1,3 @@ [module] repo=git@github.com:Cisco-8000-sonic/platform-cisco-8000.git -ref=202405.1.1.2 +ref=202405.1.1.3 From 29b900f6c3311e094606db3e3ed557b29171089a Mon Sep 17 00:00:00 2001 From: mssonicbld <79238446+mssonicbld@users.noreply.github.com> Date: Tue, 28 Jan 2025 10:01:52 +0800 Subject: [PATCH 2/2] [RDMA] correct egress buffer size for Arista-7050CX3-32S-D48C8 DualToR (#21347) #### Why I did it ### Symptom: [MSFT ADO 28240256 [SONiC_Nightly][Failed_Case][qos.test_qos_sai.TestQosSai][testQosSaiHeadroomPoolSize][20231110][broadcom][Arista-7050CX3-32S-D48C8] For Arista-7050CX3-32S-D48C8 (BCM56870_A0 / TD3), in headroom pool size test, inject lossless traffic into multiple ingress ports, exhausted share buffer first, then before exhaust all the headroom pool observed egress drop. Expected appearance is ingress drop, so test failed. ### RCA: (Check BRCM CSP CS00012358392 "Egress lossless pool size update for Arista-7050CX3-32S-D48C8 DualToR" in detail) ``` Pool: egress_lossless_pool ---- -------- mode static size 32340992 type egress ---- -------- ... ... Pool: ingress_lossless_pool ---- -------- mode dynamic size 32689152 type ingress xoff 2058240 ---- -------- ``` As above output of command "mmuconfig --list", for Arista-7050CX3-32S-D48C8's buffer configuration, egress buffer is less than ingress buffer. So, before exhausting all headroom pool, reach egress buffer's limit first. and then trigger egress drop. ### MMU register dump analysis **Total Ingress buffer limit for Pool 0:** Shared: THDI_BUFFER_CELL_LIMIT_SP=**0x1CDC4** Headroom: THDI_HDRM_BUFFER_CELL_LIMIT_HP: **0x1F68** Min reserved per PG: 0x12 cells per PG. Check THDI_PORT_PG_CONFIG_PIPE0, THDI_PORT_PG_CONFIG_PIPE1. There are total 80 PG with Min limit configured to 0x12. This takes up a total of 80*0x12 = 0x5A0 cells. Total ingress for Pool0 : 0x1CDC4 + 0x1F68 + 0x5A0 = **0x1F2CC (127692 cells). ** **Total Egress buffer limits for Pool 0:** Shared: MMU_THDM_DB_POOL_SHARED_LIMIT = **0x1ed7c** Reserved: Q_MIN for lossless Queue 3,4 : **0** In your scenario, your total usage stats would be: Ingress: Total number of Active PGs failure_prs.log skip_prs.log PG_MIN + Shared_count + Headroom count = **0x1ED7E** Egress: Total egress usage count: **0x1ed7d** Look at the above allocation, can clearly see that, if number of ingress ports is LESS, then Ingress Cell usage will decrease because Min guarantee per PG will decrease, so Total Ingress will be less than Total Egress in that case. If number of ingress ports increase, the Ingress Usage increases, which makes Total Ingress greater than Total Egress, and this results in Egress Queue Drops. ##### Work item tracking - Microsoft ADO **28240256 **: #### How I did it In BRCM CSP CS00012358392 "Egress lossless pool size update for Arista-7050CX3-32S-D48C8 DualToR" , brcm update mmuconfig . ``` Platform Type Config Uplinks Downlinks Standby All Ports Up All Ports Down Notes Arista-7050CX3-32S-D48C8 (none) DualTOR 8 24 24 m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=117246 m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=119694 noe ### When there is a linkdown event on an in-use uplink port: m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=126726 m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=127734 noe ### THDI_BUFFER_CELL_LIMIT_SP(0).LIMIT += 93 m MMU_THDM_DB_POOL_RESUME_LIMIT(0) RESUME_LIMIT=15831 m MMU_THDM_DB_POOL_RESUME_LIMIT(0) RESUME_LIMIT=15957 noe ### MMU_THDM_DB_POOL_SHARED_LIMIT(0).SHARED_LIMIT += 93 m MMU_THDM_DB_POOL_SHARED_LIMIT(1) SHARED_LIMIT=92288 m MMU_THDM_DB_POOL_SHARED_LIMIT(1) SHARED_LIMIT=95255 noe ### MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 11 m MMU_THDM_DB_POOL_RESUME_LIMIT(1) RESUME_LIMIT=11527 m MMU_THDM_DB_POOL_RESUME_LIMIT(1) RESUME_LIMIT=11897 noe ### MMU_THDM_DB_POOL_SHARED_LIMIT(1).SHARED_LIMIT += 74 m MMU_THDR_DB_CONFIG1_PRIQ SPID=1 m MMU_THDR_DB_CONFIG1_PRIQ SPID=1 noe ### MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 9 for x=0,639,10 '\ for x=0,639,10 '\ noe ### When there is a linkdown event on an in-use downlink port: mod MMU_THDM_DB_QUEUE_CONFIG_PIPE0 $x 10 Q_SPID=1 ;\ mod MMU_THDM_DB_QUEUE_CONFIG_PIPE0 $x 10 Q_SPID=1 ;\ noe ### THDI_BUFFER_CELL_LIMIT_SP(0).LIMIT += 71 mod MMU_THDM_DB_QUEUE_CONFIG_PIPE1 $x 10 Q_SPID=1' mod MMU_THDM_DB_QUEUE_CONFIG_PIPE1 $x 10 Q_SPID=1' noe ### MMU_THDM_DB_POOL_SHARED_LIMIT(0).SHARED_LIMIT += 71 noe ### MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 8 noe ### MMU_THDM_DB_POOL_SHARED_LIMIT(1).SHARED_LIMIT += 56 noe ### MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 7 ``` And applied egress buffer pool size relevant part to image repo, as below: ``` m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=117246 m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=126726 ``` #### How to verify it - push change to private branch "xuchen3/20231110.24/CS00012358392/Arista-7050CX3-32S-D48C8.dualtor" to build private image ``` $ git log - * c363f5b1c8 (2024-10-30 23:12) - bugfix: CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor, static_th * 9c284f015c (2024-10-29 09:15) - bugfix : CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor * 7f855c8ae8 (2024-10-28 23:52) - CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor ``` - aand then run qos sai test, **pass all qos sai test**, include headroom pool size test. https://elastictest.org/scheduler/testplan/673e052ad3c216e9a194b719?testcase=qos%2ftest_qos_sai.py&type=console ![image](https://github.com/user-attachments/assets/2c7dfad6-2160-4012-9f4b-3819e316f8f8) - and run **full nightly test** , not observed regression issue. https://dev.azure.com/mssonic/internal/_build/results?buildId=718645&view=results - PS. also run additional test to verify above changes just **work for Arista-7050CX3-32S-D48C8 dualtor**, not impact other platforms #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 - [ ] 202211 - [ ] 202305 #### Tested branch (Please provide the tested image version) - [ ] - [ ] #### Description for the changelog #### Link to config_db schema for YANG module changes #### A picture of a cute animal (not mandatory but encouraged) --- .../BALANCED/buffers_defaults_t0.j2 | 15 +++++++++++---- ...ffer-arista7050cx3-dualtor-remap-disabled.json | 8 ++++---- .../py3/buffer-arista7050cx3-dualtor.json | 8 ++++---- 3 files changed, 19 insertions(+), 12 deletions(-) diff --git a/device/arista/x86_64-arista_7050cx3_32s/Arista-7050CX3-32S-D48C8/BALANCED/buffers_defaults_t0.j2 b/device/arista/x86_64-arista_7050cx3_32s/Arista-7050CX3-32S-D48C8/BALANCED/buffers_defaults_t0.j2 index 57a7fb1314..f62d89115a 100644 --- a/device/arista/x86_64-arista_7050cx3_32s/Arista-7050CX3-32S-D48C8/BALANCED/buffers_defaults_t0.j2 +++ b/device/arista/x86_64-arista_7050cx3_32s/Arista-7050CX3-32S-D48C8/BALANCED/buffers_defaults_t0.j2 @@ -8,10 +8,17 @@ {%- endfor %} {%- endmacro %} +{%- set ingress_lossless_pool_size = '32689152' %} +{%- set egress_lossless_pool_size = '32340992' %} +{%- if (DEVICE_METADATA is defined) and ('localhost' in DEVICE_METADATA) and ('subtype' in DEVICE_METADATA['localhost']) and (DEVICE_METADATA['localhost']['subtype'] == 'DualToR') %} + {%- set ingress_lossless_pool_size = '32441856' %} + {%- set egress_lossless_pool_size = '32441856' %} +{%- endif %} + {%- macro generate_buffer_pool_and_profiles() %} "BUFFER_POOL": { "ingress_lossless_pool": { - "size": "32689152", + "size": "{{ingress_lossless_pool_size }}", "type": "ingress", "mode": "dynamic", "xoff": "2058240" @@ -22,7 +29,7 @@ "mode": "dynamic" }, "egress_lossless_pool": { - "size": "32340992", + "size": "{{egress_lossless_pool_size }}", "type": "egress", "mode": "static" } @@ -31,12 +38,12 @@ "ingress_lossy_profile": { "pool":"ingress_lossless_pool", "size":"0", - "static_th":"32689152" + "static_th":"{{ingress_lossless_pool_size }}" }, "egress_lossless_profile": { "pool":"egress_lossless_pool", "size":"0", - "static_th":"32340992" + "static_th":"{{egress_lossless_pool_size }}" }, "egress_lossy_profile": { "pool":"egress_lossy_pool", diff --git a/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor-remap-disabled.json b/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor-remap-disabled.json index 54e7e8167b..9989890914 100644 --- a/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor-remap-disabled.json +++ b/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor-remap-disabled.json @@ -62,7 +62,7 @@ "BUFFER_POOL": { "ingress_lossless_pool": { - "size": "32689152", + "size": "32441856", "type": "ingress", "mode": "dynamic", "xoff": "2058240" @@ -73,7 +73,7 @@ "mode": "dynamic" }, "egress_lossless_pool": { - "size": "32340992", + "size": "32441856", "type": "egress", "mode": "static" } @@ -82,12 +82,12 @@ "ingress_lossy_profile": { "pool":"ingress_lossless_pool", "size":"0", - "static_th":"32689152" + "static_th":"32441856" }, "egress_lossless_profile": { "pool":"egress_lossless_pool", "size":"0", - "static_th":"32340992" + "static_th":"32441856" }, "egress_lossy_profile": { "pool":"egress_lossy_pool", diff --git a/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor.json b/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor.json index 4b55f2bbe8..994e276ce1 100644 --- a/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor.json +++ b/src/sonic-config-engine/tests/sample_output/py3/buffer-arista7050cx3-dualtor.json @@ -62,7 +62,7 @@ "BUFFER_POOL": { "ingress_lossless_pool": { - "size": "32689152", + "size": "32441856", "type": "ingress", "mode": "dynamic", "xoff": "2058240" @@ -73,7 +73,7 @@ "mode": "dynamic" }, "egress_lossless_pool": { - "size": "32340992", + "size": "32441856", "type": "egress", "mode": "static" } @@ -82,12 +82,12 @@ "ingress_lossy_profile": { "pool":"ingress_lossless_pool", "size":"0", - "static_th":"32689152" + "static_th":"32441856" }, "egress_lossless_profile": { "pool":"egress_lossless_pool", "size":"0", - "static_th":"32340992" + "static_th":"32441856" }, "egress_lossy_profile": { "pool":"egress_lossy_pool",