Skip to content

fix the route check monit error seen when test_route_consistency.py#16586

Closed
abdosi wants to merge 60 commits intosonic-net:masterfrom
abdosi:master
Closed

fix the route check monit error seen when test_route_consistency.py#16586
abdosi wants to merge 60 commits intosonic-net:masterfrom
abdosi:master

Conversation

@abdosi
Copy link
Copy Markdown
Contributor

@abdosi abdosi commented Jan 20, 2025

What/Why I did:
Fix the route check monit error seen when test_route_consistency.py

How I did:

  1. Explicit route_check at start/end of test case and restart monit routeCheck to remove fall positive monit error logs

abdosi and others added 30 commits February 5, 2021 17:00
as we are seding packet > 4k in some cases where there is HBM involved
to fill the buffer faster.

Signed-off-by: Abhishek Dosi <[email protected]>
abdosi and others added 14 commits October 25, 2024 19:34
…ER_ATTR_EGRESS and

SAI_LAG_MEMBER_ATTR_INGRESS value of enable/disable

On disable:
1. Control packets to/from should not be processed (BGP down)
2. Data packets from/to CPU not be processed (ping to IP interface should faild)
3. Data packets across ports not to be processed  (ping to another peer ip should fail)

Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
Signed-off-by: Abhishek Dosi <[email protected]>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@abdosi abdosi requested a review from yejianquan January 20, 2025 04:01
@abdosi
Copy link
Copy Markdown
Contributor Author

abdosi commented Jan 20, 2025

@yejianquan / @cyw233 @vperumal for viz.

@cyw233
Copy link
Copy Markdown
Contributor

cyw233 commented Jan 20, 2025

Hey @abdosi, seems like the flaky8 check failed with following error:

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/route/test_route_consistency.py:19:1: E302 expected 2 blank lines, found 1
tests/route/test_route_consistency.py:41:1: E302 expected 2 blank lines, found 1

@yejianquan
Copy link
Copy Markdown
Collaborator

yejianquan commented Jan 20, 2025

Hi @abdosi , could you take a look into #16390 ?
And one concern from my side is:

I notice we have chance to hit this issue on other test modules also.

Like:

E               Match Messages:
E               2025 Jan 18 16:51:51.737497 x-lc3-1 ERR monit[894]: 'routeCheck' status failed (255) -- Failure results: {{#012    "asic0": {#012        "missed_ROUTE_TABLE_routes": [#012            "10.0.0.0/31",#012            "10.0.0.12/31",#012            "10.0.0.16/31",#012            "10.0.0.20/31",#012            "10.0.0.4/31",#012            "10.0.0.8/31",#012            "3.3.3.3/32"#012        ]#012    }#012}}#012Failed. Look at reported mismatches above#012add: {#012    "asic2": [],#012    "asic1": [],#012    "asic0": []#012}#012del: {#012    "asic2": [],#012    "asic1": [],#012    "asic0": []#012}

alive      = []
args       = [{'x-lc1-1': <tests.common.plugins.loganalyzer.loganalyzer.LogAnalyzer object at 0x7fd512e34b20>, 'x-sup-1].2025-01-18-16:50:57', x-sup-1': 'test_bfd_flap[ipv4-x-sup-1].2025-01-18-16:50:58'}]

This is found during

bfd/test_bfd_static_route.py::TestBfdStaticRoute::test_bfd_flap[x-sup-1]

https://elastictest.org/scheduler/testplan/678a4593103fb5950c7b0b1f?testcase=bfd%2Ftest_bfd_static_route.py&type=console&leftSideViewMode=detail&prop=start_time&order=ascending

It's flaky, not stably reproduced.
SHould we add this route check failure(missed_ROUTE_TABLE_routes) into common ignore list?

@yejianquan
Copy link
Copy Markdown
Collaborator

# and anoter cycle as part of this test can cause failure of this test because of monit ERR.
# To mitigate this make sure route_check is clean before starting the test and if so than restart
# monit route check to reset it's error check counter
for duthost in duthosts:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function could be used in wide scope and before every function in the future.
monit restart could be fast, but route_check could be slow
Can we use SafeThreadPoolExecutor to run them in parallel?

# monit route check to reset it's error check counter
for duthost in duthosts:
if duthost.is_supervisor_node():
continue
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it as a sanity check?

@yejianquan
Copy link
Copy Markdown
Collaborator

Hi @abdosi ,
with #16876 we can close #16586?

@cyw233
Copy link
Copy Markdown
Contributor

cyw233 commented Apr 7, 2025

Covered by #16876, we can close this PR now

@yejianquan yejianquan closed this Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants