Skip to content

Add hw reset stress test#14770

Open
Nir-Az wants to merge 6 commits intorealsenseai:developmentfrom
Nir-Az:hw-reset-loop
Open

Add hw reset stress test#14770
Nir-Az wants to merge 6 commits intorealsenseai:developmentfrom
Nir-Az:hw-reset-loop

Conversation

@Nir-Az
Copy link
Collaborator

@Nir-Az Nir-Az commented Feb 26, 2026

No description provided.

Copilot AI review requested due to automatic review settings February 26, 2026 16:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive hardware reset stress test for RealSense D400 and D500 devices. The test repeatedly resets devices and verifies they reconnect successfully, with configurable iteration counts based on test context (nightly vs weekly) and connection type (USB/GMSL vs DDS).

Changes:

  • Added new stress test file with context-aware iteration counts (10-100 iterations depending on nightly/weekly context and connection type)
  • Updated testing documentation to explain how to run nightly-only tests using --context flag
  • Added copyright header convention to agent instructions emphasizing use of current year

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
unit-tests/live/hw-reset/test-stress.py New stress test that performs repeated hardware resets with proper timeout handling, device matching via serial numbers, and failure tracking across iterations
.github/skills/testing.md Added documentation section explaining how to run nightly-only tests with --context flag and added --debug option documentation
.github/agent-instructions.md Added explicit copyright header convention requiring current year for new files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Nir-Az Nir-Az requested review from AviaAv and removed request for AviaAv February 26, 2026 18:36
@Nir-Az
Copy link
Collaborator Author

Nir-Az commented Mar 1, 2026

@AviaAv forgot to add a nightly run to make sure it works.
Will run and update, can review the code for now
@OhadMeir please review the frequency
We need to run all cameras on gating/nightly/weekly with different freq IMO, and it adds time.
We currently have a regression at hw-reset loop on MIPI devices and we missed it


# test:device each(D400*)
# test:device each(D500*)
# test:donotrun:!nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in nightly run, do we use both 'nightly' and 'weekly' contexts? if not, this test will not run on weekly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, checking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed so that it will run nightly too now, will monitor it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also removed D585S as it fails and we are running an old version, will reopen after we update the LibCI FW

# test:device each(D500*)
# test:donotrun:!nightly
# test:timeout 360
# test:timeout:weekly 3600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locally this syntax # test:timeout:weekly 3600 didn't work for me, let's please also test weekly CI run to make sure this test passes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works
image

if pl == "D400":
return MAX_ENUM_TIME_D400
if pl == "D500":
is_dds = ( d.supports( rs.camera_info.connection_type )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already done before this function call, let's remove this and add global is_dds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, used the global one.
According to co-pilot, reading from global is permitted w/o the need of declaring the global usage

Copy link
Contributor

@AviaAv AviaAv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

new_dev_handle = None

log.d( f"[{i}/{iterations}] Sending HW-reset" )
dev_for_reset.hardware_reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the end of the loop we do dev_for_reset = new_dev_handle, but if added_sn == tested_sn is false, we will call hardware_reset on None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review new code

time.sleep( 1 ) # let the device settle before the first reset

failed_removal = []
failed_reconnect = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we break in case of failure during the test loop, we have two mutually exclusive arrays, which hold up to one variable

This is fine if intended to for readability, but could be simplified

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — separate arrays give a clearer failure message distinguishing removal failures from reconnect failures.

dev_for_reset = None # always the latest live handle — used for hardware_reset() calls
device_removed = False
device_added = False
new_dev_handle = None # updated by callback so each iteration gets the fresh handle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need 3 device handles for the same device? Using only dev should be enough IMO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 2 and we need it , the old and the new.
With 1 it sometimes work and sometimes not when the new device got a different address

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code was refactored

@OhadMeir
Copy link
Contributor

OhadMeir commented Mar 1, 2026

@AviaAv forgot to add a nightly run to make sure it works. Will run and update, can review the code for now @OhadMeir please review the frequency We need to run all cameras on gating/nightly/weekly with different freq IMO, and it adds time. We currently have a regression at hw-reset loop on MIPI devices and we missed it

I would normally only run at weekly context, but since it checks a bug we have missed, it seems OK to run a shorter version on nightly and a long one on weekly.

@Nir-Az Nir-Az requested a review from AviaAv March 1, 2026 15:15
# Fails on D585S and D555

# test:device each(D400*)
# test:device each(D500*) !D585S !D555S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!D555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants