Skip to content

[common] add PerformanceTimer#893

Merged
lguohan merged 2 commits intosonic-net:masterfrom
eddieruan-alibaba:master
Nov 1, 2024
Merged

[common] add PerformanceTimer#893
lguohan merged 2 commits intosonic-net:masterfrom
eddieruan-alibaba:master

Conversation

@a114j0y
Copy link
Contributor

@a114j0y a114j0y commented Jul 16, 2024

What I did

Extend the PerformanceIntervalTimer utility in sonic-sairedis repo to measure API performance.

To optimize performance of a periodically invoked function, we not only pay attention to how long it takes to finish execution (busy time), but also care about how long it waits to get executed (idle time). The total time determines the actual throughput in the real world system.

Hence we want to see data in 3 ways:

  1. Idle time between two calls (a gap between this start and the last end, during which the thread just waits)
  2. Busy time of each API call (interval between this start and this end, it's the true execution time)
  3. Total time elapsed (idle + busy)

| <- Initial Gap -> | <- - 1st **Execution** - -> | <- 2nd Gap -> | <- - 2nd **Execution** - -> |

| <---------------- 1st **Total** --------------> | <------------ 2nd **Total**--------------> |

Other features

  • calculate the number of tasks finished per call (Routes Per Second in the context of routes loading)
  • batch the logging output
  • adjust the SWSSLOGLEVEL using a file indicator, default level is "INFO", hence we could enable perf logs in a dynamic way.

Why I did it

  • original utility has limited functionality

    • we need gap data to obtain more insight on the mutual influence of upstream and downstream modules
  • original utility is in libsairedis, to use it in sonic-swss, there would be much Makefile.am change, not necessary.

  • a utility tool for swss should be included in swss-common

  • when enabled on-demand, it could help measure the API performance under scaled traffic

@a114j0y a114j0y changed the title add performancetimer into libswsscommon swss-common/common: add PerformanceTimer Jul 22, 2024
@a114j0y a114j0y changed the title swss-common/common: add PerformanceTimer [common] add PerformanceTimer Jul 22, 2024
@a114j0y a114j0y force-pushed the master branch 8 times, most recently from 50ac121 to f9bc905 Compare July 29, 2024 20:59
@qiluo-msft qiluo-msft requested a review from liuh-80 August 14, 2024 22:00
@eddieruan-alibaba
Copy link

eddieruan-alibaba commented Oct 20, 2024

@qiluo-msft @liuh-80 @lguohan can you help to review and approve it? We need this util PR in for other loading time PRs.

@lguohan lguohan merged commit c6364d1 into sonic-net:master Nov 1, 2024
@liuh-80
Copy link
Contributor

liuh-80 commented Jul 31, 2025

Hi @a114j0y , the performance timer test is unstable, almost every PR need trigger validation 5-10 times to pass this test. can you improve it?

Expected equality of these values:
output
Which is: "{"API":"basic_set_name","RPS[k]":5.0,"Tasks":3000,"Total[ms]":603,"busy[ms]":400,"idle[ms]":203,"m_gaps":[3,200],"m_incs":[1000,2000],"m_intervals":[100,300]}"
expected
Which is: "{"API":"basic_set_name","RPS[k]":5.0,"Tasks":3000,"Total[ms]":600,"busy[ms]":400,"idle[ms]":200,"m_gaps":[0,200],"m_incs":[1000,2000],"m_intervals":[100,300]}"
[ FAILED ] PerformancetimerTest.basic (704 ms)

@arista-hpandya
Copy link
Contributor

I was looking at the recent PR failures, and it seems like the test is running into issues because it's checking for exact millisecond timings.

This is a common issue, things like system load and OS scheduler delays mean that sleep_for is never perfectly exact, so the test can become a bit flaky. It doesn't look like there's any actual bug in your timer logic, it's just the test being a little too strict.

I'm working on a small patch to address this. The idea is to have the test check for a range of acceptable values instead of one exact number. This should make it much more stable and prevent false negatives on the CI. I'll have that ready for you to review soon!

Let me know what you think @a114j0y

cc: @vmittal-msft

@arista-hpandya
Copy link
Contributor

Looks like someone beat me to it: #961

Will verify if this solves the issue on my PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants