Skip to content

[HLD] Virtual NUT Testbed (vNUT) — KVM-based virtual NUT testing#22977

Open
r12f wants to merge 1 commit intomasterfrom
hld/vnut-topo
Open

[HLD] Virtual NUT Testbed (vNUT) — KVM-based virtual NUT testing#22977
r12f wants to merge 1 commit intomasterfrom
hld/vnut-topo

Conversation

@r12f
Copy link
Collaborator

@r12f r12f commented Mar 14, 2026

Description of PR

Summary:
Add High-Level Design document for the Virtual NUT Testbed (vNUT), which enables running sonic-mgmt NUT tests against KVM-based virtual SONiC instances without physical hardware.

The HLD covers architecture, prerequisites, testbed definition, deployment/teardown via testbed-cli.sh, test execution, implementation details, and a complete nut-2tiers example.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

Provide design documentation for the vNUT testbed feature (implementation: #22976).

How did you do it?

Added docs/testbed/README.testbed.VirtualNUT.md covering:

  • KVM-based architecture with shared br1 management bridge
  • Testbed definition format (references existing NUT design)
  • Deployment, teardown, and test execution workflows (all inside sonic-mgmt container)
  • Implementation details (vnut_network.py, management network, instance launch)
  • Complete nut-2tiers example with inventory files and commands

How did you verify/test it?

All documented commands verified against working vNUT deployment:

  • add-vnut-topo, deploy-cfg, test_pretest.py all passing

Any platform specific information?

N/A

Documentation

This PR is the documentation itself. Implementation PR: #22976

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good HLD overall — well-structured, clear deployment/teardown flow, and good use of mermaid diagrams. A few areas need attention:

  1. TG device metadata inconsistency: The lab devices CSV lists the TG as IxiaChassis/ixia, but the actual container is docker-ptf. This will confuse readers and may break inventory-driven logic.
  2. Hardcoded credentials in example inventory: Should explicitly note these are placeholders and recommend using Ansible Vault.
  3. Missing resource requirements: No mention of host RAM/CPU/disk needed to run 3 DUT containers + 1 TG.
  4. Missing error handling/troubleshooting guidance: What happens if container launch fails, veth creation fails, or services don't come up?
  5. Security note needed: --privileged and Docker socket mount are significant security implications worth calling out.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a well-structured HLD. The document clearly covers architecture, deployment, and teardown. A few areas need attention: internal consistency between the deployment overview and implementation details, missing explanation of key parameters, and the limitations section could be more actionable. See inline comments.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good HLD overall — well-structured with clear sections covering architecture, deployment, and teardown. Several concerns already raised by prior reviewers are valid (Ixia metadata mismatch, tg_api_server relevance, hardcoded passwords, security implications). Additional observations below.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reviewers approved. LGTM.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior feedback has been thoroughly addressed. The document is now well-structured with proper security notes, resource requirements, configurable subnet documentation, two-phase container launch explanation, expanded limitations section, and complete CLI flag documentation. No new issues found. LGTM.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review of iteration 2: All previous feedback has been thoroughly addressed. Key improvements include:

  • Security warning for --privileged and Docker socket mount
  • Resource requirements section added
  • tg_api_server and auto_recover: 'True' string usage explained with inline comments
  • IxiaChassis/Force10-S6000 metadata documented as framework constraints with known limitations noted
  • Credential handling updated to reference cSONiC credential resolution path
  • Two-phase container networking explained (default bridge → br-mgmt for static IPs)
  • Deployment Step 3 now consistent with Section 8's two-phase description
  • Management subnet documented as configurable via defaults/main.yml
  • veth naming (vm prefix collision with vs testbeds) documented with vn prefix under consideration
  • Orphaned container detection guidance added
  • All run_tests.sh flags (-m individual, -a False, -u) now documented
  • Limitations section expanded with test compatibility categories and performance notes

The document is well-structured and comprehensive. LGTM.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (iteration 2): All prior feedback has been addressed. The document now includes security warnings, resource requirements, subnet configurability notes, comprehensive flag documentation, hash input format details, HwSku/IxiaChassis metadata explanations, and consistent deployment step descriptions. Well-structured HLD — LGTM.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reviewers approved on re-review. All prior feedback addressed. LGTM.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (iteration 3) summary:

The document has improved significantly. All 20 previously resolved threads remain properly addressed — good work on the credential handling, security notes, resource requirements, naming documentation, and consistency fixes.

3 open threads remain unresolved:

  1. Topology placement (line 48) — The nut-2tiers example topology is still embedded in the Architecture section rather than moved to a dedicated example section as requested.
  2. Manual container setup (line 73) — The manual docker run approach is still shown alongside the recommended setup-container.sh. Consider removing or collapsing the manual approach.
  3. Example co-location (line 100) — The example topology details should be co-located with the testbed YAML definition section.

1 open thread resolved:

  1. Credential handling (line 125) — Now properly references group_vars credential resolution instead of hardcoded passwords. ✅

No new issues found in this iteration. The document is well-structured and thorough. The remaining items are organizational suggestions from another reviewer.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (iteration 3) summary:

Previously resolved threads: 20/24 threads confirmed addressed and remain resolved. The doc has improved significantly — resource requirements, security notes, credential handling, and implementation details are all well-documented.

3 threads remain open (from maintainer feedback):

  1. Topology as example (line 48): nut-2tiers details are still embedded in Section 2 (Architecture) rather than in a dedicated example section.
  2. setup-container only (line 73): Manual docker run approach is still shown alongside the recommended setup-container.sh path. Consider removing or collapsing it.
  3. Example co-location (line 100): Related to #1 — example topology should be moved near the YAML definition.

1 thread resolved in this iteration: Credentials section (line 125) now properly references group_vars without hardcoded passwords.

No new issues found in the latest diff. The document is well-structured and comprehensive. Addressing the 3 remaining maintainer comments about document organization would complete this PR.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Copy link

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iteration 3 re-review: all prior review threads are resolved. The HLD is comprehensive and well-structured — architecture, deployment, teardown, test execution, and limitations are clearly documented. The Ansible role and vnut_network.py module are clean with good idempotency handling. The doc now addresses previous feedback on resource requirements, credential handling, security notes, configurable subnets, hash naming, and the vm prefix collision risk. No new issues found. LGTM.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@r12f
Copy link
Collaborator Author

r12f commented Mar 15, 2026

Addressed review feedback in commit f870571:

  • Replaced all mermaid diagrams with ASCII art equivalents (Section 2 Architecture diagram and Section 10 Example diagram).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@r12f r12f changed the title [HLD] Virtual NUT Testbed (vnut-topo) [HLD] Virtual NUT Testbed (vNUT) — KVM-based virtual NUT testing Mar 15, 2026
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@github-actions github-actions bot requested review from cyw233 and sanjair-git March 16, 2026 13:25
Add high-level design document for the vNUT testbed feature at
docs/testbed/README.testbed.VirtualNUT.md. Covers architecture,
prerequisites, deployment, teardown, test execution, implementation
details, and a complete nut-2tiers example with mermaid resource
diagram showing exact VM names, bridge names, veth pairs, and
interface mappings.

Signed-off-by: r12f <r12f.code@gmail.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Copy link
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agent on behalf of Ying. Reviewed; no issues found.

Copy link
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agent on behalf of Ying. Reviewed; no issues found.

@yxieca
Copy link
Collaborator

yxieca commented Mar 18, 2026

AI agent on behalf of Ying.\n- Avoid using test_ prefix for fixtures per review guidance.

Copy link
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agent on behalf of Ying. Reviewed; no issues found.

Copy link
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agent on behalf of Ying. Reviewed; no issues found.

Copy link
Collaborator

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI agent on behalf of Ying. Reviewed; no issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants