Skip to content

[dualtor_io] Fix the start marker not found issue#18096

Merged
yxieca merged 1 commit intosonic-net:masterfrom
lolyu:loganalyzer_start_marker_miss
Apr 28, 2025
Merged

[dualtor_io] Fix the start marker not found issue#18096
yxieca merged 1 commit intosonic-net:masterfrom
lolyu:loganalyzer_start_marker_miss

Conversation

@lolyu
Copy link
Collaborator

@lolyu lolyu commented Apr 23, 2025

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

Fix the following issue:

E               Exception: start-LogAnalyzer-test_active_tor_reboot_downstream_standby[active-standby].2025-04-22-10:20:35 was not found in /var/log

The issue is introduce by PR: #17722.
The root cause is, if the dualtor io reboot failure testcases are running over Arista devices, the syslogs doesn't persist through reboot due to /var/log is a tmpfs directory. So loganalyzer fails to find the start marker in this case.

Signed-off-by: Longxiang Lyu [email protected]

How did you do it?

As the primary goal is to collect syslog after reboot, let's change the start marker as the kernel first boot log, so the dualtor io testcase with reboot will be able to collect logs after kernel boot up.

How did you verify/test it?

dualtor_io/test_tor_failure.py::test_active_tor_reboot_upstream[active-standby] PASSED                                                                                                         [100%]

====================================================================== 1 passed, 1 deselected, 2 warnings in 527.94s (0:08:47) =======================================================================

Syslogs from kernel bootup are collected:

23/04/2025 09:38:52 parallel.on_terminate                    L0085 INFO   | process analyze_logs--<MultiAsicSonicHost str2-7260cx3-acs-13> terminated with exit code None
23/04/2025 09:38:53 base._run                                L0108 DEBUG  | /home/lolv/workspace/sonic-mgmt/tests/common/plugins/loganalyzer/loganalyzer.py::save_extracted_log#444: [str2-7260cx3-acs-12] AnsibleModule::fetch Result => {"changed": true, "md5sum": "77eebb9308f927dbac90e57a9e32a9f4", "dest": "/tmp/syslog.str2-7260cx3-acs-12.2025-04-23-09:38:46", "remote_md5sum": null, "checksum": "28831404fe8b0f9f51d3d722374685f0c72353bd", "remote_checksum": "28831404fe8b0f9f51d3d722374685f0c72353bd", "_ansible_no_log": null, "failed": false}
23/04/2025 09:38:53 loganalyzer.analyze                      L0386 DEBUG  | Analyze files ['/tmp/syslog.str2-7260cx3-acs-12.2025-04-23-09:38:46']
23/04/2025 09:38:53 loganalyzer.analyze                      L0387 DEBUG  |     match_regex=""
23/04/2025 09:38:53 loganalyzer.analyze                      L0388 DEBUG  |     ignore_regex=""
23/04/2025 09:38:53 loganalyzer.analyze                      L0389 DEBUG  |     expect_regex=""
23/04/2025 09:38:53 loganalyzer.analyze                      L0396 DEBUG  | /tmp/syslog.str2-7260cx3-acs-12.2025-04-23-09:38:46 file content:

2025 Apr 23 09:35:44.892281 str2-7260cx3-acs-12 NOTICE kernel: [    0.000000] Linux version 6.1.0-22-2-amd64 ([email protected]) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21)
2025 Apr 23 09:35:44.892284 str2-7260cx3-acs-12 INFO augenrules[664]: pid 635
2025 Apr 23 09:35:44.892288 str2-7260cx3-acs-12 INFO kernel: [    0.000000] Command line: reboot=p console=ttyS0 acpi=on Aboot=Aboot-norcal7-7.2.0-pcie2x4-6128821 block_flash=pci0000:00/0000:00:1f.2/.*host./target0:0:0/.*$ block_drive=pci0000:00/0000:00:1f.2/.*host./target2:0:0/.*$ net_ma2=pci0000:00/0000:00:01.0/.*$ net_ma1=pci0000:00/0000:00:01.1/.*$ block_usb2=pci0000:00/0000:00:14.0/\(usb3/3-2\|usb4/4-2\)/.*$ block_usb1=pci0000:00/0000:00:14.0/\(usb3/3-3\|usb4/4-5\)/.*$ platform=rook scd.lpc_irq=7 scd.lpc_res_addr=0xb0000000 scd.lpc_res_size=0x10000 sid=Gardena cmdline-aboot-end logs_inram=on i2c-i801.disable_features=0x10 processor.max_cstate=1 intel_idle.max_cstate=0 tsc=reliable pcie_ports=native rhash_entries=1 usb-storage.delay_use=0 reassign_prefmem iommu=on intel_iommu=on libata.force=1.00:noncq varlog_size=4096 sonic.mode=fixed security=apparmor apparmor=1 rw net.ifnames=0 systemd.unified_cgroup_hierarchy=0 log_buf_len=1M quiet systemd.show_status=auto hwaddr_ma1=d4:af:f7:1d:59:d0 root=UUID=2916bd62-f01c-4609-841a-96ad37a90476 loop=image
2025 Apr 23 09:35:44.892289 str2-7260cx3-acs-12 INFO kernel: [    0.000000] BIOS-provided physical RAM map:
2025 Apr 23 09:35:44.892290 str2-7260cx3-acs-12 INFO kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
2025 Apr 23 09:35:44.892291 str2-7260cx3-acs-12 INFO kernel: [    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009ffff] usable
2025 Apr 23 09:35:44.892292 str2-7260cx3-acs-12 INFO kernel: [    0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@lolyu lolyu requested review from bingwang-ms and yxieca April 23, 2025 09:57
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Collaborator

yxieca commented Apr 24, 2025

@lolyu would it be more generic to search the original start marker, if missing the boot up marker?

@yxieca yxieca merged commit b580edc into sonic-net:master Apr 28, 2025
12 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Apr 29, 2025
Fix the following issue:

E               Exception: start-LogAnalyzer-test_active_tor_reboot_downstream_standby[active-standby].2025-04-22-10:20:35 was not found in /var/log

The issue is introduce by PR: sonic-net#17722.
The root cause is, if the dualtor io reboot failure testcases are running over Arista devices, the syslogs doesn't persist through reboot due to /var/log is a tmpfs directory. So loganalyzer fails to find the start marker in this case.

Signed-off-by: Longxiang Lyu [email protected]

How did you do it?
As the primary goal is to collect syslog after reboot, let's change the start marker as the kernel first boot log, so the dualtor io testcase with reboot will be able to collect logs after kernel boot up.

How did you verify/test it?
dualtor_io/test_tor_failure.py::test_active_tor_reboot_upstream[active-standby] PASSED                                                                                                         [100%]

====================================================================== 1 passed, 1 deselected, 2 warnings in 527.94s (0:08:47) =======================================================================

Signed-off-by: Longxiang Lyu <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #18162

mssonicbld pushed a commit that referenced this pull request Apr 29, 2025
Fix the following issue:

E               Exception: start-LogAnalyzer-test_active_tor_reboot_downstream_standby[active-standby].2025-04-22-10:20:35 was not found in /var/log

The issue is introduce by PR: #17722.
The root cause is, if the dualtor io reboot failure testcases are running over Arista devices, the syslogs doesn't persist through reboot due to /var/log is a tmpfs directory. So loganalyzer fails to find the start marker in this case.

Signed-off-by: Longxiang Lyu [email protected]

How did you do it?
As the primary goal is to collect syslog after reboot, let's change the start marker as the kernel first boot log, so the dualtor io testcase with reboot will be able to collect logs after kernel boot up.

How did you verify/test it?
dualtor_io/test_tor_failure.py::test_active_tor_reboot_upstream[active-standby] PASSED                                                                                                         [100%]

====================================================================== 1 passed, 1 deselected, 2 warnings in 527.94s (0:08:47) =======================================================================

Signed-off-by: Longxiang Lyu <[email protected]>
nhe-NV pushed a commit to nhe-NV/sonic-mgmt that referenced this pull request May 12, 2025
Code sync sonic-net/sonic-mgmt:202411 => 202412
```
* | c6a94a0 (pub_upstream/202411) Revert "[dualtor_io] Allow duplications for link down downstream I/O (sonic-net#17909)" (sonic-net#18192)
* | de454d5 [testARPCompleted] Cleanup ptf ip after test failure (sonic-net#18170)
* | 4a3d1d9 [dualtor] Refine `fdb_mac_learning_test.py` (sonic-net#18092)
* | 5964a78 [dualtor_io] Fix the start marker not found issue (sonic-net#18096)
* | ce40816 Extend LACP time multiplier for advanced-reboot tests with cEOS peers (sonic-net#17964)
* | 0e70ba3 adjust port selection in case testQosSaiXonHysteresis for Cisco-8101 (sonic-net#18130)
* | 8bb7203 [202411] Restore disable packet aging fixture 202411 (sonic-net#18103)
* | 8f6d1a3 Filter out Not Applicable values in command line (sonic-net#18006)
* | 9d5de5c Backport t0-118 test configs to 202411 (sonic-net#17983)
```
auspham pushed a commit to auspham/sonic-mgmt that referenced this pull request May 30, 2025
Code sync sonic-net/sonic-mgmt:202411 => 202503
```
*   6b59eaa (HEAD -> sync/202503, origin/sync/202503) Merge remote-tracking branch 'pub_upstream/202411' into sync/202503
|\
| * c6a94a0 (pub_upstream/202411) Revert "[dualtor_io] Allow duplications for link down downstream I/O (sonic-net#17909)" (sonic-net#18192)
| * de454d5 [testARPCompleted] Cleanup ptf ip after test failure (sonic-net#18170)
| * 4a3d1d9 [dualtor] Refine `fdb_mac_learning_test.py` (sonic-net#18092)
| * 5964a78 [dualtor_io] Fix the start marker not found issue (sonic-net#18096)
| * ce40816 Extend LACP time multiplier for advanced-reboot tests with cEOS peers (sonic-net#17964)
| * 0e70ba3 adjust port selection in case testQosSaiXonHysteresis for Cisco-8101 (sonic-net#18130)
| * 8bb7203 [202411] Restore disable packet aging fixture 202411 (sonic-net#18103)
| * 8f6d1a3 Filter out Not Applicable values in command line (sonic-net#18006)
| * 9d5de5c Backport t0-118 test configs to 202411 (sonic-net#17983)
| * e758401 mark xfail on generic hash test for isolated topo (sonic-net#18071)
| * c65ceab [202411][dualtor] Skip pfcwd warm reboot on dualtor (sonic-net#18072)
| * c509006 Improve disabling packet aging to support swap_syncd (sonic-net#17728) (sonic-net#17739)
| * 9dc2244 [202411][dualtor-aa] Fix test_arp_dualtor on active-active dualtor (sonic-net#18073)
| * cf12a33 fixed tacacs duplicate user issue (sonic-net#18068)
| * 330a893 Fix telemetry/test_events.py for dualtor (sonic-net#18025)
| * dc6fee8 Remove admin down ports in BUFFER PG check logic (sonic-net#17505)
| * 805d538 Update generic hash test to support dualtor active active topology (sonic-net#16217)
| * 7c31e46 [dualtor_io] Allow duplications for link down downstream I/O (sonic-net#17909)
| * a7f50c6 Fix vlan vs router mac issue with test_qos_dscp_mapping.py (sonic-net#17846) (sonic-net#18003)
| * 9ab1e7a Skip test_incremental_qos on Mellanox dualtor (sonic-net#17406) (sonic-net#18048)
| * f42afd0 Force eos default creds to be string (sonic-net#18026)
| * be542b0 Restore config after vxlan_crm from vxlan_ecmp. (sonic-net#17767)
| * f0718b9 [Fix for Issue sonic-net#17413] Modified the Tx Rx port id list selection for all to all scenario (sonic-net#17919)
| * 3eb4ed4 [dualtor_io] Collect syslog to debug (sonic-net#17722)
| * d5bd995 Disable PFC-WD during PCBB and some wmk test improvements (sonic-net#17889)
| * 2f512aa Update outer UDP sport range to exclude port 53 (sonic-net#17570) (sonic-net#17798)
| * 980b373 skip test_bgp_slb advanced reboot for isolated topo (sonic-net#17470)
| * 408bf9e Default the inner dscp to outer dscp map to be 1-1. (sonic-net#17860)
| * 37495a1 Add dualtor fixtures to no_traffic test. (sonic-net#17916)
| * a13b599 Only print the matched syslog in loganalzyer teardown check, no traceback info printed (sonic-net#17926)
| * 6127f29 Revert "Skip test_vnet_decap on Cisco-8000 with 202411 (sonic-net#17776)" (sonic-net#17941) (sonic-net#17942)
| * 60274db Increase timeout to 5 in verify_packet_any_port for background traffic (sonic-net#17904)
```
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
Fix the following issue:

E               Exception: start-LogAnalyzer-test_active_tor_reboot_downstream_standby[active-standby].2025-04-22-10:20:35 was not found in /var/log

The issue is introduce by PR: sonic-net#17722.
The root cause is, if the dualtor io reboot failure testcases are running over Arista devices, the syslogs doesn't persist through reboot due to /var/log is a tmpfs directory. So loganalyzer fails to find the start marker in this case.

Signed-off-by: Longxiang Lyu [email protected]

How did you do it?
As the primary goal is to collect syslog after reboot, let's change the start marker as the kernel first boot log, so the dualtor io testcase with reboot will be able to collect logs after kernel boot up.

How did you verify/test it?
dualtor_io/test_tor_failure.py::test_active_tor_reboot_upstream[active-standby] PASSED                                                                                                         [100%]

====================================================================== 1 passed, 1 deselected, 2 warnings in 527.94s (0:08:47) =======================================================================

Signed-off-by: Longxiang Lyu <[email protected]>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
Fix the following issue:

E               Exception: start-LogAnalyzer-test_active_tor_reboot_downstream_standby[active-standby].2025-04-22-10:20:35 was not found in /var/log

The issue is introduce by PR: sonic-net#17722.
The root cause is, if the dualtor io reboot failure testcases are running over Arista devices, the syslogs doesn't persist through reboot due to /var/log is a tmpfs directory. So loganalyzer fails to find the start marker in this case.

Signed-off-by: Longxiang Lyu [email protected]

How did you do it?
As the primary goal is to collect syslog after reboot, let's change the start marker as the kernel first boot log, so the dualtor io testcase with reboot will be able to collect logs after kernel boot up.

How did you verify/test it?
dualtor_io/test_tor_failure.py::test_active_tor_reboot_upstream[active-standby] PASSED                                                                                                         [100%]

====================================================================== 1 passed, 1 deselected, 2 warnings in 527.94s (0:08:47) =======================================================================

Signed-off-by: Longxiang Lyu <[email protected]>
Signed-off-by: Aharon Malkin <[email protected]>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
Fix the following issue:

E               Exception: start-LogAnalyzer-test_active_tor_reboot_downstream_standby[active-standby].2025-04-22-10:20:35 was not found in /var/log

The issue is introduce by PR: sonic-net#17722.
The root cause is, if the dualtor io reboot failure testcases are running over Arista devices, the syslogs doesn't persist through reboot due to /var/log is a tmpfs directory. So loganalyzer fails to find the start marker in this case.

Signed-off-by: Longxiang Lyu [email protected]

How did you do it?
As the primary goal is to collect syslog after reboot, let's change the start marker as the kernel first boot log, so the dualtor io testcase with reboot will be able to collect logs after kernel boot up.

How did you verify/test it?
dualtor_io/test_tor_failure.py::test_active_tor_reboot_upstream[active-standby] PASSED                                                                                                         [100%]

====================================================================== 1 passed, 1 deselected, 2 warnings in 527.94s (0:08:47) =======================================================================

Signed-off-by: Longxiang Lyu <[email protected]>
Signed-off-by: Guy Shemesh <[email protected]>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
Fix the following issue:

E               Exception: start-LogAnalyzer-test_active_tor_reboot_downstream_standby[active-standby].2025-04-22-10:20:35 was not found in /var/log

The issue is introduce by PR: sonic-net#17722.
The root cause is, if the dualtor io reboot failure testcases are running over Arista devices, the syslogs doesn't persist through reboot due to /var/log is a tmpfs directory. So loganalyzer fails to find the start marker in this case.

Signed-off-by: Longxiang Lyu [email protected]

How did you do it?
As the primary goal is to collect syslog after reboot, let's change the start marker as the kernel first boot log, so the dualtor io testcase with reboot will be able to collect logs after kernel boot up.

How did you verify/test it?
dualtor_io/test_tor_failure.py::test_active_tor_reboot_upstream[active-standby] PASSED                                                                                                         [100%]

====================================================================== 1 passed, 1 deselected, 2 warnings in 527.94s (0:08:47) =======================================================================

Signed-off-by: Longxiang Lyu <[email protected]>
Signed-off-by: Guy Shemesh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants