Skip to content

Add proposal for multi-DUT and multi-ASIC testing support#2347

Closed
wangxin wants to merge 3 commits intosonic-net:masterfrom
wangxin:multi-dut-asic
Closed

Add proposal for multi-DUT and multi-ASIC testing support#2347
wangxin wants to merge 3 commits intosonic-net:masterfrom
wangxin:multi-dut-asic

Conversation

@wangxin
Copy link
Collaborator

@wangxin wangxin commented Oct 15, 2020

Description of PR

Summary:
Fixes # (issue)

This is a documentation proposing changes to test infrastructure for multi-DUT and multi-ASIC testing support.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

SONiC is evolving to support multiple DUT and multiple ASIC systems. The test infrastructure also needs to be updated to support that new architecture.

How did you do it?

Added a document proposing changes to test infrastructure. This document is mainly based on work from Arvindsrinivasan Lakshminarasimhan and Sandeep Malhotra.

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@wangxin wangxin requested review from a team, arlakshm and lguohan October 15, 2020 09:54
@wangxin
Copy link
Collaborator Author

wangxin commented Oct 15, 2020

@sanmalho-git Could you please help review as well?

@lguohan
Copy link
Contributor

lguohan commented Oct 15, 2020

@wangxin, sonic-mgmt pr not running now?

@sanmalho-git
Copy link
Contributor

I am assuming that the 'duthost' fixture going to be updated to return 'MultiAsicSonicHost'. If so, can we add the changed code for that fixture.

Assuming 'duthost' is a 'MultiAsicSonicHost', then with the approach above, we are giving the flexibility to enhance a test for multi-asic by using the 'duthost' fixture. If the test needs to be backward compatible to both single-asic DUT and multi-asic DUT, the call in the test would be

duthost.foo()

Now for single-asic DUT the return is a dictionary. But for multi-asic DUT, this is a list of dictionaries. We now have to deal with the two different data structures depending on whether we are multi-asic or single asic DUT.

@daall
Copy link
Contributor

daall commented Oct 15, 2020

retest this please

@wangxin
Copy link
Collaborator Author

wangxin commented Oct 16, 2020

Yes, the duthost fixture will automatically be a MultiAsicSonicHost instance after the duthosts fixture is update to a list of MultiAsicSonicHost. It's because the duthost fixture is implemented as returning an item from the duthosts list according to dut_index in pytest request.session object. If dut_index is not available, always return the first item from duthosts as duthost.

The new duthost fixture can be used for both single and multi-ASIC DUT. For multi-ASIC DUT, for example, if we call

duthost.bgp_facts()

Without asic_index argument, the call will be delegated to SonicHost. Eventually, self.sonichost.bgp_facts() will be called. self.sonichost is an attribute of MultiAsicSonicHost. It refers to instance of the SonicHost class. The returned result will be just calling bgp_facts module without instance_id argument. It will be a dictionary same as calling bgp_facts without instance_id on single-ASIC DUT.

If we call

duthost.bgp_facts(asic_index=1)

With asic_index=1, method bgp_facts of SonicAsic class will be called for object self.asics[1]. self.asics is an attribute of MultiAsicSonicHost. It's a list of SonicAsic instances. The method bgp_facts of SonicAsic will eventually call self.sonichost.bgp_facts(instance_id=1). Here the original asic_index=1 is translated to instance_id=1 in SonicAsic.bgp_facts method. This self.sonichost is an attribute of SonicAsic. It is a reference to instance of the SonicHost class. The returned result will be calling bgp_facts module with instance_id=1 argument. It will be a dictionary same as calling bgp_facts with instance_id=1 on single-ASIC DUT.

If we call

duthost.bgp_facts(asic_index="all")

With asic_index="all", the bgp_facts method of object in self.asics list will be called. The returned result will be a list.

So, the key design here is that if asic_index argument is "all", then call the module for all ASICs and return the result in a list.
Without asic_index, the module is just called for the host (or global/default namespace).

@sanmalho-git
Copy link
Contributor

Let's assume we have this call in the test case to ansible module 'foo' that is multi-asic aware.

duthost.foo(asic_index='all')

The key is that the same call should work for a single-asic, and multi-asic duthost.

If duthost represents a single asic pizza box, then this will eventually hit MultiAsicSonicHost._run_on_asics. With this, we will hit SonicAsic.foo (which has self.asic_index=0) with asic_index='all'. So, asic_index=0 is passed into 'foo' ansible module. At this time, the ansible module doesn't have the knowledge whether this asic_index 0 is to represent the 'global' namespace, or actually 'asic0'.

When prototyping this for bgp_facts ansible module, I see it trying to get bgp_facts on 'bgp0' instance - which doesn't exist for single asic pizza box, and thus we fail.

One way to solve this it to add a check for num_asic in the asic_index='all' case in MultiAsicSonicHost._run_on_asics method. If num_asic == 1, then call the module on self.sonichost, else call on SonicAsic:

    def _run_on_asics(self, *module_args, **complex_args):
        if "asic_index" not in complex_args:
            # Default ASIC/namespace
            return getattr(self.sonichost, self.attr)(*module_args, **complex_args)
        else:
            asic_index = complex_args.pop("asic_index")
            if type(asic_index) == int:
                # Specific ASIC/namespace
                return getattr(self.asics[asic_index], self.attr)(*module_args, **complex_args)
            elif type(asic_index) == str and asic_index.lower() == "all":
                # All ASICs/namespaces
                if self.sonichost.facts['num_asic'] == 1:
                    return [getattr(self.sonichost, self.attr)(*module_args, **complex_args)]
                return [getattr(asic, self.attr)(*module_args, **complex_args) for asic in self.asics]
            else:
                raise ValueError("Argument 'asic_index' must be an int or string 'all'.")

@wangxin
Copy link
Collaborator Author

wangxin commented Oct 19, 2020

@sanmalho-git Good catch! Thank you! I'll update the document.

* all ASICs
* specific ASIC
* combinations of the above scenarios

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great if you have have some diagram to show the relation between all classes you plan to add to represent the multi-asic, multi-dut.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update.

@@ -0,0 +1,499 @@
# Improve the test infrastructure to support multi-DUT and multi-ASIC systems
Copy link
Contributor

@lguohan lguohan Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems only addressing pytest infrastructure. I think somewhere we need to add a doc to describe the sonic testbed infrastructure support for that. for example, ansible deployment script, topology file.

How do we represent the multi-dut topology? minigraph generation? maybe this is not covered by this doc. do we have doc to cover that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'll need more documents to cover these areas. I will be working on them.

duthosts.supervisor_nodes[0].bar()
duthosts.supervisor_nodes["supervisor0"].bar()
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each scenario should there be a list suggested APIs to use? If everyone uses same API it's easier to understand the code. Also becomes easier to refactor if there's such a need later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! This document proposed various ways of using APIs. For some scenarios, multiple APIs can do the same thing. It is a little bit hard to finalize such a list for each scenario before we have some real code. But sure, we'll try to improve. It would be easier if there is always single best way to do something.

```

In the above examples:
* `duthost.asics` should be a list of objects. Each object represents an ASIC in the DUT. `duthost.asics` represents all the ASIC instances in the DUT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests are applicable only for the frontend asics, and there could be some test cases that apply only for backend. We should also have:

duthost.frontend_asics
duthost.foo(asic_index="frontend_asics")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can enhance it this way.

[node.foo() for node in duthosts.nodes]
```
For multi-ASIC related operations on DUT host, the function or ansible module call should accept optional `asic_index` argument.
* No `asic_index`, perform operation just on the host (or default namespace). For example, just run command `"show ip bgp summary"`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading. It really depends on the CLIs. In this example "show ip bgp summary" CLI walks through all the ASIC namespaces and does not return the 'bgp summary' from default namespace.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update it to make it more clear.

@yxieca
Copy link
Collaborator

yxieca commented Sep 9, 2021

@sanmalho-git @wangxin @arlakshm what do we need to do to get this PR either merged or closed?

@wangxin
Copy link
Collaborator Author

wangxin commented Dec 1, 2021

Some documentations for multi-ASIC and multi-DUT has been added. I'll integrate some contents of this PR to the existing documents. Closing this PR for now.

@wangxin wangxin closed this Dec 1, 2021
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
swss:
* 665bbbb 2022-06-27 | Add support for IP interface loopback action (sonic-net#2307) (HEAD -> 202205, github/202205) [Lior Avramov]
* c7f5743 2022-06-28 | [asan] suppress the static variable leaks (sonic-net#2354) [Yakiv Huryk]
* 37e2a31 2022-06-28 | [tests] [asan] add graceful stop flag (sonic-net#2347) [Yakiv Huryk]
* 5ab84cf 2022-06-28 | [202205][cherry-pick] Fix mux_acl_rule adding issue (sonic-net#2358) [bingwang-ms]

linkmgrd:
* a836ef7 2022-06-28 | Use Vlan MAC as src MAC for link prober by default  (sonic-net#93) (HEAD -> 202205) [Jing Zhang]
* a828e86 2022-06-28 | Fix inconsistent mux state (sonic-net#92) (github/202205) [Longxiang Lyu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…x-kernel] advance submodule head (sonic-net#12025)

linkmgrd:
* ab5b2c1 2022-09-02 | Fix mux config (sonic-net#128) (HEAD -> 202205, github/202205) [Longxiang Lyu]

utilities:
* 7de9305 2022-09-07 | [generate dump]Added error message when saisdkdump fails (sonic-net#2356) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam]
* c5b0a6d 2022-09-07 | [counterpoll]Fixing counterpoll show for tunnel and acl stats (sonic-net#2355) [Sudharsan Dhamal Gopalarathnam]
* 1452b44 2022-09-05 | [GCU] Fix missing backend in dry run (sonic-net#2347) [jingwenxie]
* bc7b845 2022-09-04 | Add Password Hardening CLI support (sonic-net#2338) [davidpil2002]
* 55e8948 2022-09-06 | [fast-reboot]Avoid stopping masked services during fast-reboot (sonic-net#2335) [Sudharsan Dhamal Gopalarathnam]
* f7d69d4 2022-08-30 | Replace cmp in acl_loader with operator.eq (sonic-net#2328) [Zhaohui Sun]
* 4054ebb 2022-09-05 | Add verification for override (sonic-net#2305) [jingwenxie]
* 729d811 2022-05-30 | Fix sonic-installer and 'show version' command crash when database docker not running issue. (sonic-net#2183) [Hua Liu]

platform-daemons:
* 36ba7c0 2022-09-07 | [ycable] cleanup logic for creating grpc future ready (sonic-net#289) (HEAD -> 202205) [vdahiya12]
* 2a9db73 2022-09-01 | [ycabled] fix insert events from xcvrd;cleanup some mux toggle logic (sonic-net#287) [vdahiya12]

platform-common:
* d7c990d 2022-09-03 | [CMIS] 'get_transceiver_info' should return 'None' when CMIS cable EEPROM is not ready  (sonic-net#305) (HEAD -> 202205) [Kebo Liu]

linux-kernel:
* 25ea052 2022-08-31 | [patch]: Add accpt_untracked_na kernel param (sonic-net#292) (HEAD -> 202205) [Lawrence Lee]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants