swss: flush g_asicState after each event is done#570
swss: flush g_asicState after each event is done#570qiluo-msft merged 3 commits intosonic-net:masterfrom
Conversation
* add flush() after event is handled in case some entries are still in buffer, don't wait * with the changes in sairedis and swss-common, route performance improved by 200~300 routes/sec Signed-off-by: Dong Zhang [email protected]
|
you have a few PR submitted in sairedis, common, can you give dependencies on these PR? are they all independent or some pr requires other pr to be merge first? |
|
swss-common#218, this one should be merged first. sairedis#335 depends on swss-common#218. swss-common#218 and sairedis#335 are one feature. The other three, swss#570 and sairedis#336 depends on swss-common#219, but they are independent in compiling point of view. So swss-common#219 should be merged first for these feature. |
orchagent/orchdaemon.cpp
Outdated
| for (Orch *o : m_orchList) | ||
| o->doTask(); | ||
|
|
||
| flush(); //flush after each event is handled, don't wait |
There was a problem hiding this comment.
//flush after e [](start = 17, length = 15)
Please help follow the coding style of other comments. #Closed
There was a problem hiding this comment.
Sure. Will do
orchagent/orchdaemon.cpp
Outdated
| for (Orch *o : m_orchList) | ||
| o->doTask(); | ||
|
|
||
| flush(); //flush after each event is handled, don't wait |
There was a problem hiding this comment.
flush [](start = 8, length = 5)
About the performance "route performance improved by 200~300 routes/sec". Could you provide more details?
- What is the test environment and test steps?
- What is the performance before this PR and after?
- If it is a small scope test, could you add a unit test or vs test case to automate it?
#Closed
There was a problem hiding this comment.
- the environment and test steps are listed on the sildes we discussed about two weeks ago . I already sent the slides that time internally, if you didn't get it, I can resent it to you.
- The performance is listed in the slides as well. Only one PR didn't make sense, it involved many PRs for each optimization. We need to look at them together.
- We tested the routing performance on physical switch , it matched what I listed on the slides.
There was a problem hiding this comment.
- the environment and test steps are listed on the sildes we discussed about two weeks ago . I already sent the slides that time internally, if you didn't get it, I can resent it to you.
- The performance is listed in the slides as well. Only one PR didn't make sense, it involved many PRs for each optimization. We need to look at them together.
- We tested the routing performance on physical switch , it matched what I listed on the slides.
There was a problem hiding this comment.
OK. I just want to know performance before this PR and after? You only mentioned improved by 200~300 routes/sec. Rough numbers are ok. #Closed
There was a problem hiding this comment.
before changes it is 1300 routes/sec, platform is brcm and SAI is 3.1, CPU is Intel(R) Atom(TM) CPU C2558 @ 2.40GHz. After this enabling pipeline changes only, it is about 1500-1600 routes/sec, if plus the syncd buffer changes in other PR, it is about 1700-1800 routes/sec
|
Also , the flush() is needed, currently it will flush() when there is nothing to select, the SELECT_WAIT time is 1000ms, this is too long for routes, we need to flush them right away, don't wait the timeout |
|
I mean the above flush() for timeout, since you already flush() after execute(). #Closed |
|
got it, then that flush is not necessary, we can remove it. |
…mment *remove unnecessary flush() in timeout case and update comment Signed-off-by: Dong Zhang [email protected]
…mment *remove unnecessary flush() in timeout case and update comment Signed-off-by: Dong Zhang [email protected]
**What I did** Revert #570 We should only `flush` the orchagent/syncd communication channel when there is no active tasks in orchagent. This will not influence the end-to-end performance in long run but introduce SELECT_TIMEOUT (1 s) latency if there is remaining data left inside the orchagent/syncd communication channel after previous `flush`, which is not a big deal. Fix sonic-net/sonic-buildimage#5570
sonic-net#1478) **What I did** Revert sonic-net#570 We should only `flush` the orchagent/syncd communication channel when there is no active tasks in orchagent. This will not influence the end-to-end performance in long run but introduce SELECT_TIMEOUT (1 s) latency if there is remaining data left inside the orchagent/syncd communication channel after previous `flush`, which is not a big deal. Fix sonic-net/sonic-buildimage#5570
* [syncd] Fix RPC compilation issues * Add missing rpc link for vssyncd
* swss: flush g_asicState after each event is done * add flush() after event is handled in case some entries are still in buffer, don't wait * with the changes in sairedis and swss-common, route performance improved by 200~300 routes/sec * swss-common: remove unnecessary flush() in timeout case and update comment * remove unnecessary flush() in timeout case and update comment
sonic-net#1478) **What I did** Revert sonic-net#570 We should only `flush` the orchagent/syncd communication channel when there is no active tasks in orchagent. This will not influence the end-to-end performance in long run but introduce SELECT_TIMEOUT (1 s) latency if there is remaining data left inside the orchagent/syncd communication channel after previous `flush`, which is not a big deal. Fix sonic-net/sonic-buildimage#5570
add flush() after event is handled in case some entries are still in buffer, don't wait
with the changes in sairedis and swss-common, route performance improved by 200~300 routes/sec
Signed-off-by: Dong Zhang [email protected]
What I did
Why I did it
How I verified it
Details if related