added new HA fixtures to programme dash ha scope and ha set tables via…#22390
added new HA fixtures to programme dash ha scope and ha set tables via…#22390yxieca merged 10 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@yue-fred-gao |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4f12997 to
83cd529
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Please add 202511 and 202506 as back port requests! |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
b0500ec to
de2efd2
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4d3a1ea to
9894f36
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
yxieca
left a comment
There was a problem hiding this comment.
Review: DASH HA Scope & HA Set fixtures
Thanks for the PR @nnelluri-cisco! I've reviewed the changes and have a few concerns:
🔴 Bugs / Blockers
1. Missing expected_op_type argument in activate_dash_ha() (ha_utils.py ~line 225):
pending_id = get_pending_operation_id(duthost, scope_key, timeout=60)get_pending_operation_id() requires (duthost, scope_key, expected_op_type, timeout) — this call is missing expected_op_type and will TypeError at runtime.
2. Duplicate & inconsistent state-check functions — verify_ha_state() parses pipe-delimited table output (ha_state \| active), while wait_for_ha_state() → extract_ha_state() parses JSON ("ha_role": "active"). These check different fields (ha_state vs ha_role) from the same command output. One of them is likely wrong, or they're intended for different output formats that aren't distinguished in the code.
3. Bare module import will fail in CI (conftest.py line 14):
from ha_utils import ...This bare import only works if the test runner's cwd happens to be tests/ha/. Should be from tests.ha.ha_utils import ... or use a relative import (e.g., from .ha_utils import ...).
4. setup_dash_ha_from_json reads local filesystem (conftest.py ~line 430):
with open(ha_set_file) as f: # path: /data/tests/common/ha/...This reads from the test runner host, not the DUT. The hardcoded path /data/tests/common/ha/ only works if the mgmt container mounts the tests directory there. Consider using Path(__file__).resolve().parent / ... or os.path.join(os.path.dirname(__file__), ...) for portability.
⚠️ Design Concerns
5. build_dash_ha_set_args() hardcodes values — ignores fields["vdpu_ids"] and always outputs ["vdpu0_0","vdpu1_0"]. Same for preferred_standalone_vdpu_index (hardcoded to 0, ignoring the field value). This makes the JSON config data misleading since the function doesn't actually use it.
6. Double-nested polling — get_pending_operation_id() already has its own wait_until loop internally, but wait_for_pending_operation_id() wraps it in another wait_until loop. This means you have nested retry logic, which makes timeout behavior unpredictable. Should pick one pattern.
7. No fixture teardown — activate_dash_ha_from_json has yield but no cleanup code after it. setup_dash_ha_from_json doesn't yield or return at all. Neither fixture cleans up HA tables after the test module finishes, which could leave state for subsequent tests.
8. Inconsistent key separator — JSON config files use | as separator (vdpu0_0|haset0_0), but fixture code and ha_utils use : (vdpu0_0:haset0_0). Which separator does proto_utils.py actually expect? This could cause key mismatches.
9. disabled field type inconsistency — In setup_dash_ha_from_json it's "true" (string), in activate_dash_ha_from_json it's False (Python bool). build_dash_ha_scope_args handles both via str().lower(), but the inconsistency is confusing and fragile.
📝 Minor
- PR title typo: "fixures" → "fixtures"
- Extra blank line in conftest.py import block (after
from ha_utils import () ast.literal_eval(f"'{...}'")inextract_pending_operationsis overly complex — a simple.strip()would sufficedash_ha_scope_config_table.jsonschema was completely replaced (wasDASH_HA_SET_CONFIG_TABLE, nowDASH_HA_SCOPE_CONFIG_TABLE) — could break existing consumers of this file
|
/azp run |
|
@yxieca |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
7d09f41 to
be4b9ae
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
… proto utils Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com>
Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com>
Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com>
Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com>
- Convert disabled boolean to lowercase string without quotes (False → "false") - Fix field type handling: strings quoted, booleans unquoted - Implement proper two-phase setup: disabled=true → false - Add pending_operation_id activation workflow - Fix build_dash_ha_scope_args() and build_dash_ha_scope_activate_args() Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com>
Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com>
get_pending_operation_id() was refactored to no longer accept a timeout parameter. Switch to wait_for_pending_operation_id() which supports the timeout argument to avoid a TypeError at runtime. Signed-off-by: nnelluri <nnelluri@cisco.com>
Signed-off-by: nnelluri <nnelluri@cisco.com>
- Change scope config keys from '|' to ':' separator to match fixture usage - Change preferred_standalone_vdpu_index from string "0" to int 0 Signed-off-by: nnelluri <nnelluri@cisco.com>
Signed-off-by: nnelluri <nnelluri@cisco.com>
be4b9ae to
cfb49ee
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@yxieca Key separator mismatch: uses keys like , but fixtures use when programming . Should the JSON use as well? Type mismatch: In , is a string (). treats this as numeric. Probably should be an int in JSON for consistency. can you check and merge the PR |
yxieca
left a comment
There was a problem hiding this comment.
Re-reviewed: key separator and type issues resolved.
…a… (sonic-net#22390) What is the motivation for this PR Not provided in PR description. How did you do it Added new pytest fixtures. How did you verify/test it Ran the tests and verified the DPU Tables HA Scope and HA Set. Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com> Signed-off-by: Mihut Aronovici <aronovic@cisco.com>
…a… (sonic-net#22390) What is the motivation for this PR Not provided in PR description. How did you do it Added new pytest fixtures. How did you verify/test it Ran the tests and verified the DPU Tables HA Scope and HA Set. Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com> Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
| ) | ||
|
|
||
| try: | ||
| res = duthost.shell(cmd) |
There was a problem hiding this comment.
@zjswhhh, this would be queried from GNMI Server i assume. Please correct me if i'm wrong, Is this supported today?
…a… (sonic-net#22390) What is the motivation for this PR Not provided in PR description. How did you do it Added new pytest fixtures. How did you verify/test it Ran the tests and verified the DPU Tables HA Scope and HA Set. Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com> Signed-off-by: Abhishek <abhishek@nexthop.ai>
…a… (sonic-net#22390) What is the motivation for this PR Not provided in PR description. How did you do it Added new pytest fixtures. How did you verify/test it Ran the tests and verified the DPU Tables HA Scope and HA Set. Signed-off-by: nnelluri-cisco <nnelluri@cisco.com> Signed-off-by: nnelluri <nnelluri@cisco.com> Signed-off-by: Venkata Gouri Rajesh Etla <vrajeshe@cisco.com>
… proto utils
Description of PR
Summary:Summary:HA scope configuration table is programmed by SDN controller and contains the HA config for each HA scope DPU.
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
How did you do it?
added new pytest fixtures
How did you verify/test it?
Ran the tests and verified the DPU Tables HA Scope and HA Set
Any platform specific information?
Platform: x86_64-8102_28fh_dpu_o-r0
HwSKU: Cisco-8102-28FH-DPU-O
Supported testbed topology if it's a new test case?
t1-smartswitch-ha
Documentation
New PR is raised for HA config alone and closed old PR #20820
For proto utils will raise separate PR