pick memory_utilization related commits#702
Merged
Pterosaur merged 9 commits intoAzure:202412from Sep 15, 2025
Merged
Conversation
What is the motivation for this PR? Add the memory threshold How did you do it? Add the memory initial threshold, need to adjust based on nightly test results. How did you verify/test it? Run nightly pipeline
Approach What is the motivation for this PR? The memory utilization plugin operates at per test case level, it uses a pytest hook to collect the memory usage before and after each test case. It then calculates the memory diff and compares it against a predefined threshold, if the difference exceeds the threshold, a pytest failure will be triggered. Since "@pytest.hookimpl(tryfirst=True)", the hook will execute before all the teardown fixtures. This means that if the memory check fails, the hook will return failure directly, and cause an exception immediately, interrupting the tear down process, as a result, it will cause the next test case failure with following error message. AssertionError: previous item was not torn down properly How did you do it? Do not raise the failure in hook directly. Instead, save all the results and report any failures in the test case's teardown fixture to avoid affecting other test cases. How did you verify/test it? Run elastic test https://elastictest.org/scheduler/testplan/684004dba52da0ec6421c4ad?testcase=ecmp%2Ftest_ecmp_sai_value.py&type=console Co-authored-by: Liping Xu <108326363+lipxu@users.noreply.github.com>
What is the motivation for this PR? There are so many memory above threshold alarm in nightly test How did you do it? Update the FRR memory threshold and make the alarm more readable memory_increase_threshold, FRR has it's own memory management system, not return the memory to system immediately, increase the threshold. 1: top:zebra: update from 64 to 128M 2: frr_bgp: update from 32 to 64M 3: frr_zebra: update from 16 to 64M memory_high_threshold, frr bgp memory usage related to the count of neighbors, increase the threshold. we need to set the threshold according to the count of neighbors in the further. 1: frr_bgp: update from 128 to 256M How did you verify/test it? Run nightly test https://elastictest.org/scheduler/testplan/685ac58d2461750d1f5a11c9
What is the motivation for this PR?
failed on teardown with "Failed: [ALARM]: monit:memory_usage, Previous memory usage 74.8 MB exceeds high threshold 70.0 MB (previous: 74.8 MB, current: 74.8 MB)
How did you do it?
Enhance the plugin, add a new type of percentage_points
"type": "value": Absolute values in MB
Example: {"type": "value", "value": 128} means 128 MB
For example: top, free, frr_memory commands that return memory in megabytes
"type": "percentage": Relative percentage of baseline value
Example: {"type": "percentage", "value": "10%"} means 10% of current memory usage
Calculation: If baseline is 100 MB, threshold becomes 10 MB
For example: Dynamic thresholds that scale with current usage
"type": "percentage_points": Absolute percentage values
Example: {"type": "percentage_points", "value": 75} means 75%
For example: monit, docker stats commands that return percentage data
For increases: {"type": "percentage_points", "value": 10} means 10 percentage points (e.g., from 70% to 80%)
How did you verify/test it?
Hack threshold
> pytest.fail(failure_message)
E Failed: [ALARM]: monit:memory_usage, Previous memory usage 50.4% exceeds high threshold 40% (previous: 50.4%, current: 50.1%)
E [ALARM]: monit:memory_usage, Current memory usage 50.1% exceeds high threshold 40% (previous: 50.4%, current: 50.1%)
E [ALARM]: docker:database, Previous memory usage 1.6% exceeds high threshold 1% (previous: 1.6%, current: 1.6%)
E [ALARM]: docker:database, Current memory usage 1.6% exceeds high threshold 1% (previous: 1.6%, current: 1.6%)
E [ALARM]: frr_bgp:used, Previous memory usage 70.0 MB exceeds high threshold 16.0 MB (previous: 70.0 MB, current: 70.0 MB)
E [ALARM]: frr_bgp:used, Current memory usage 70.0 MB exceeds high threshold 16.0 MB (previous: 70.0 MB, current: 70.0 MB)
E [ALARM]: frr_zebra:used, Previous memory usage 17.0 MB exceeds high threshold 16.0 MB (previous: 17.0 MB, current: 17.0 MB)
E [ALARM]: frr_zebra:used, Current memory usage 17.0 MB exceeds high threshold 16.0 MB (previous: 17.0 MB, current: 17.0 MB)
Run elastic
https://elastictest.org/scheduler/testplan/68804464edf1bbac5171814b
…d (#19786) In one pytest session, when all test case are skipped, then the teardown will not be executed, when there is some test case not skipped, then for the skipped test case, it will still run the teardown.
What is the motivation for this PR? disk/test_disk_exhaustion.py creates a 1.7G file in the test and deletes it at the end of the test. But "monit status" is configured to check only once every 60 secs in /etc/monit/monitrc. This provides a stale data resulting in memory high threshold getting breached. How did you do it? We should use "monit validate" instead of "monit status" How did you verify/test it? verified by running the test
Description of PR Summary: Fix following error in pretest: > pytest.fail(failure_message) E Failed: [ALARM]: frr_bgp:used, Previous memory usage 273.0 MB exceeds high threshold 256.0 MB (previous: 273.0 MB, current: 273.0 MB) E [ALARM]: frr_bgp:used, Current memory usage 273.0 MB exceeds high threshold 256.0 MB (previous: 273.0 MB, current: 273.0 MB) This is a new memory checking feature in 202505. Therefore, the fix is not applicable to 202412 branch. TH5 testbed may have 32 neighbors, and each has 6400 prefixes, the total memory after deploy-mg is around 260-300MB. increase to 384MB. signed-off-by: jianquanye@microsoft.com
Description of PR Summary: On Arista-7060X6-16PE-384C-B-t0-isolated-d96u32s2, bgp/test_bgp_gr_helper.py has a high but expected memory usage. The memory increase threshold is causing the test to fail. Relaxing the memory increase threshold. Fixes # (issue) Type of change Bug fix Testbed and Framework(new/improvement) New Test case Skipped for non-supported platforms Test case improvement Approach What is the motivation for this PR? Memory increase threshold causing test to fail. Actual usage is below memory_high_threshold. How did you do it? Relaxing the memory increase threshold by value. Before the test, 191.0MB is being used. During the test, the peak usage is 316.0MB. The increase threshold config fields dictates 2 different limits on what the maximum increase threshold can be - either an increase in X% of current usage, or increase in Y MB addition to current usage. It will use the higher of the two. Without the changes made in this review, the allowed increase would be max((50% * 191.0MB), (64MB)). Before the test, the actual memory usage is not near the memory_high_threshold (384MB) - relaxing the increase_threshold by percentage may lead to too much relaxation in other tests done. How did you verify/test it? Test no longer fails on
8 tasks
Contributor
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Pterosaur
approved these changes
Sep 15, 2025
Pterosaur
added a commit
that referenced
this pull request
Sep 20, 2025
This reverts commit c7b3b09.
Pterosaur
added a commit
that referenced
this pull request
Sep 20, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.