fix(console): fix intermittent login failures in dut_console tests#23342
Merged
lolyu merged 3 commits intosonic-net:masterfrom Mar 27, 2026
Merged
fix(console): fix intermittent login failures in dut_console tests#23342lolyu merged 3 commits intosonic-net:masterfrom
lolyu merged 3 commits intosonic-net:masterfrom
Conversation
…_timeout When find_prompt() captures a partial prompt, netmiko strip_prompt() may fail to remove the trailing prompt, causing splitlines()[-1] to return the prompt string instead of the numeric value. Using splitlines()[0] always picks the first meaningful output line, which is the actual numeric value. Signed-off-by: Liping Xu <xuliping@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Password: prompt from the DUT is sometimes split across multiple TCP reads (e.g., 'Pa' + 'ssword:'), causing re.search(pwd_pattern, output) to fail on each individual chunk. By checking return_msg (the accumulated read buffer) instead of output, we correctly detect the Password: prompt even when it arrives in fragments. This fixes the intermittent 'Socket is closed' failure in create_duthost_console where 1 in ~5 runs would fail because the password was never sent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Liping Xu <xuliping@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Add missing newline at end of file to fix the pre-commit fix-end-of-files check failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Liping Xu <xuliping@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
lolyu
approved these changes
Mar 27, 2026
Collaborator
lolyu
left a comment
There was a problem hiding this comment.
✅ Approved — Clean, Well-Targeted Bug Fix
Two surgical fixes for intermittent dut_console test failures:
ssh_console_conn.py: Using accumulatedreturn_msginstead of per-chunkoutputfor password prompt detection — correctly handles TCP fragmentation splittingPassword:across readstest_idle_timeout.py:splitlines()[0]instead ofsplitlines()[-1]for reliable TMOUT extraction — avoids prompt fragments
Both fixes are minimal, well-documented, and validated across 5 full test iterations. LGTM 🚀
| # Search for password pattern / send password | ||
| if user_sent and not password_sent and re.search(pwd_pattern, output, flags=re.I): | ||
| # Use return_msg (accumulated) instead of output to handle cases where | ||
| # 'Password:' prompt is split across multiple TCP reads (e.g. 'Pa' + 'ssword:') |
Collaborator
There was a problem hiding this comment.
✅ Good fix — TCP fragmentation splitting Password: across reads is a classic race. Using the accumulated return_msg instead of the per-chunk output is the right approach. Well-commented too.
| duthost = duthosts[enum_rand_one_per_hwsku_hostname] | ||
| logger.info("Get default session idle timeout") | ||
| default_tmout = duthost_console.send_command('echo $TMOUT') | ||
| default_tmout = duthost_console.send_command('echo $TMOUT').strip().splitlines()[0].strip() |
Collaborator
There was a problem hiding this comment.
💡 Nit: .strip().splitlines()[0].strip() could raise IndexError if send_command returns an empty string (unlikely but possible on connection issues). A defensive guard would be:
lines = duthost_console.send_command('echo $TMOUT').strip().splitlines()
default_tmout = lines[0].strip() if lines else ""Not a blocker — the old code would also fail on empty output.
ravaliyel
pushed a commit
to ravaliyel/sonic-mgmt
that referenced
this pull request
Mar 27, 2026
…onic-net#23342) What is the motivation for this PR? dut_console tests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT's Password: prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure in test_idle_timeout caused by splitlines()[-1] returning a prompt fragment instead of the numeric TMOUT value. How did you do it? ssh_console_conn.py: Changed re.search(pwd_pattern, output) to re.search(pwd_pattern, return_msg) in login_stage_2(), where return_msg is the accumulated read buffer across all chunks. test_idle_timeout.py: Changed splitlines()[-1] to splitlines()[0] in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants. How did you verify/test it? Ran all dut_console test cases on a physical testbed using internal branch dev/xuliping/20260325_202511_console_login_fix for 5 full iterations. All tests passed with no failures. Any platform specific information? None — applies to all platforms using SSH console connections. Supported testbed topology if it's a new test case? N/A (bug fix only)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix two intermittent failures in
dut_consoletests: an accumulation-buffer fix for the Password prompt detection race, and asplitlines()[0]fix for reliable TMOUT value extraction intest_idle_timeout.Description of PR
Summary:
Fix two independent intermittent failures in
dut_consoletests:ssh_console_conn.py—login_stage_2()checkedre.search(pwd_pattern, output)whereoutputis only the most recentread_channel()chunk. The DUT'sPassword:prompt can arrive split across multiple TCP reads (e.g.Pa+ssword:), causing no chunk to match and the password never being sent — resulting in an intermittent "Socket is closed" failure increate_duthost_console(~1 in 5 runs). Fix: checkreturn_msg(accumulated read buffer) instead ofoutput.test_idle_timeout.py—splitlines()[-1]could return a partial prompt string (e.g.admin@hostname:) instead of the numeric TMOUT value when the prompt was not fully stripped from the command output. Fix: usesplitlines()[0]to always read the first output line, which is always the numeric value.Both fixes were validated on internal branch
dev/xuliping/20260325_202511_console_login_fixacross 5 full test runs with no failures.Related: follows up on #23295 (blank Enter fix in the same login path — already merged).
Type of change
Back port request
Approach
What is the motivation for this PR?
dut_consoletests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT'sPassword:prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure intest_idle_timeoutcaused bysplitlines()[-1]returning a prompt fragment instead of the numeric TMOUT value.How did you do it?
ssh_console_conn.py: Changedre.search(pwd_pattern, output)tore.search(pwd_pattern, return_msg)inlogin_stage_2(), wherereturn_msgis the accumulated read buffer across all chunks.test_idle_timeout.py: Changedsplitlines()[-1]tosplitlines()[0]in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants.How did you verify/test it?
Ran all
dut_consoletest cases on a physical testbed using internal branchdev/xuliping/20260325_202511_console_login_fixfor 5 full iterations. All tests passed with no failures.Any platform specific information?
None — applies to all platforms using SSH console connections.
Supported testbed topology if it's a new test case?
N/A (bug fix only)
Documentation
N/A