Add proposal for multi-DUT and multi-ASIC testing support#2347
Add proposal for multi-DUT and multi-ASIC testing support#2347wangxin wants to merge 3 commits intosonic-net:masterfrom wangxin:multi-dut-asic
Conversation
|
@sanmalho-git Could you please help review as well? |
|
@wangxin, sonic-mgmt pr not running now? |
|
I am assuming that the 'duthost' fixture going to be updated to return 'MultiAsicSonicHost'. If so, can we add the changed code for that fixture. Assuming 'duthost' is a 'MultiAsicSonicHost', then with the approach above, we are giving the flexibility to enhance a test for multi-asic by using the 'duthost' fixture. If the test needs to be backward compatible to both single-asic DUT and multi-asic DUT, the call in the test would be Now for single-asic DUT the return is a dictionary. But for multi-asic DUT, this is a list of dictionaries. We now have to deal with the two different data structures depending on whether we are multi-asic or single asic DUT. |
|
retest this please |
|
Yes, the The new Without If we call With If we call With So, the key design here is that if |
|
Let's assume we have this call in the test case to ansible module 'foo' that is multi-asic aware. The key is that the same call should work for a single-asic, and multi-asic duthost. If duthost represents a single asic pizza box, then this will eventually hit MultiAsicSonicHost._run_on_asics. With this, we will hit SonicAsic.foo (which has self.asic_index=0) with asic_index='all'. So, asic_index=0 is passed into 'foo' ansible module. At this time, the ansible module doesn't have the knowledge whether this asic_index 0 is to represent the 'global' namespace, or actually 'asic0'. When prototyping this for bgp_facts ansible module, I see it trying to get bgp_facts on 'bgp0' instance - which doesn't exist for single asic pizza box, and thus we fail. One way to solve this it to add a check for num_asic in the asic_index='all' case in MultiAsicSonicHost._run_on_asics method. If num_asic == 1, then call the module on self.sonichost, else call on SonicAsic: |
|
@sanmalho-git Good catch! Thank you! I'll update the document. |
| * all ASICs | ||
| * specific ASIC | ||
| * combinations of the above scenarios | ||
|
|
There was a problem hiding this comment.
it would be great if you have have some diagram to show the relation between all classes you plan to add to represent the multi-asic, multi-dut.
There was a problem hiding this comment.
Sure, I'll update.
| @@ -0,0 +1,499 @@ | |||
| # Improve the test infrastructure to support multi-DUT and multi-ASIC systems | |||
There was a problem hiding this comment.
this seems only addressing pytest infrastructure. I think somewhere we need to add a doc to describe the sonic testbed infrastructure support for that. for example, ansible deployment script, topology file.
How do we represent the multi-dut topology? minigraph generation? maybe this is not covered by this doc. do we have doc to cover that?
There was a problem hiding this comment.
Yes, we'll need more documents to cover these areas. I will be working on them.
| duthosts.supervisor_nodes[0].bar() | ||
| duthosts.supervisor_nodes["supervisor0"].bar() | ||
| ``` | ||
|
|
There was a problem hiding this comment.
For each scenario should there be a list suggested APIs to use? If everyone uses same API it's easier to understand the code. Also becomes easier to refactor if there's such a need later.
There was a problem hiding this comment.
Thanks for the suggestion! This document proposed various ways of using APIs. For some scenarios, multiple APIs can do the same thing. It is a little bit hard to finalize such a list for each scenario before we have some real code. But sure, we'll try to improve. It would be easier if there is always single best way to do something.
| ``` | ||
|
|
||
| In the above examples: | ||
| * `duthost.asics` should be a list of objects. Each object represents an ASIC in the DUT. `duthost.asics` represents all the ASIC instances in the DUT. |
There was a problem hiding this comment.
Some tests are applicable only for the frontend asics, and there could be some test cases that apply only for backend. We should also have:
duthost.frontend_asics
duthost.foo(asic_index="frontend_asics")
There was a problem hiding this comment.
Sure, we can enhance it this way.
| [node.foo() for node in duthosts.nodes] | ||
| ``` | ||
| For multi-ASIC related operations on DUT host, the function or ansible module call should accept optional `asic_index` argument. | ||
| * No `asic_index`, perform operation just on the host (or default namespace). For example, just run command `"show ip bgp summary"`. |
There was a problem hiding this comment.
This is a bit misleading. It really depends on the CLIs. In this example "show ip bgp summary" CLI walks through all the ASIC namespaces and does not return the 'bgp summary' from default namespace.
There was a problem hiding this comment.
Sure, I'll update it to make it more clear.
|
@sanmalho-git @wangxin @arlakshm what do we need to do to get this PR either merged or closed? |
|
Some documentations for multi-ASIC and multi-DUT has been added. I'll integrate some contents of this PR to the existing documents. Closing this PR for now. |
swss: * 665bbbb 2022-06-27 | Add support for IP interface loopback action (sonic-net#2307) (HEAD -> 202205, github/202205) [Lior Avramov] * c7f5743 2022-06-28 | [asan] suppress the static variable leaks (sonic-net#2354) [Yakiv Huryk] * 37e2a31 2022-06-28 | [tests] [asan] add graceful stop flag (sonic-net#2347) [Yakiv Huryk] * 5ab84cf 2022-06-28 | [202205][cherry-pick] Fix mux_acl_rule adding issue (sonic-net#2358) [bingwang-ms] linkmgrd: * a836ef7 2022-06-28 | Use Vlan MAC as src MAC for link prober by default (sonic-net#93) (HEAD -> 202205) [Jing Zhang] * a828e86 2022-06-28 | Fix inconsistent mux state (sonic-net#92) (github/202205) [Longxiang Lyu] Signed-off-by: Ying Xie <ying.xie@microsoft.com>
…x-kernel] advance submodule head (sonic-net#12025) linkmgrd: * ab5b2c1 2022-09-02 | Fix mux config (sonic-net#128) (HEAD -> 202205, github/202205) [Longxiang Lyu] utilities: * 7de9305 2022-09-07 | [generate dump]Added error message when saisdkdump fails (sonic-net#2356) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam] * c5b0a6d 2022-09-07 | [counterpoll]Fixing counterpoll show for tunnel and acl stats (sonic-net#2355) [Sudharsan Dhamal Gopalarathnam] * 1452b44 2022-09-05 | [GCU] Fix missing backend in dry run (sonic-net#2347) [jingwenxie] * bc7b845 2022-09-04 | Add Password Hardening CLI support (sonic-net#2338) [davidpil2002] * 55e8948 2022-09-06 | [fast-reboot]Avoid stopping masked services during fast-reboot (sonic-net#2335) [Sudharsan Dhamal Gopalarathnam] * f7d69d4 2022-08-30 | Replace cmp in acl_loader with operator.eq (sonic-net#2328) [Zhaohui Sun] * 4054ebb 2022-09-05 | Add verification for override (sonic-net#2305) [jingwenxie] * 729d811 2022-05-30 | Fix sonic-installer and 'show version' command crash when database docker not running issue. (sonic-net#2183) [Hua Liu] platform-daemons: * 36ba7c0 2022-09-07 | [ycable] cleanup logic for creating grpc future ready (sonic-net#289) (HEAD -> 202205) [vdahiya12] * 2a9db73 2022-09-01 | [ycabled] fix insert events from xcvrd;cleanup some mux toggle logic (sonic-net#287) [vdahiya12] platform-common: * d7c990d 2022-09-03 | [CMIS] 'get_transceiver_info' should return 'None' when CMIS cable EEPROM is not ready (sonic-net#305) (HEAD -> 202205) [Kebo Liu] linux-kernel: * 25ea052 2022-08-31 | [patch]: Add accpt_untracked_na kernel param (sonic-net#292) (HEAD -> 202205) [Lawrence Lee] Signed-off-by: Ying Xie <ying.xie@microsoft.com> Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Description of PR
Summary:
Fixes # (issue)
This is a documentation proposing changes to test infrastructure for multi-DUT and multi-ASIC testing support.
Type of change
Approach
What is the motivation for this PR?
SONiC is evolving to support multiple DUT and multiple ASIC systems. The test infrastructure also needs to be updated to support that new architecture.
How did you do it?
Added a document proposing changes to test infrastructure. This document is mainly based on work from Arvindsrinivasan Lakshminarasimhan and Sandeep Malhotra.
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation