Skip to content

Update hsFlowd to close the pipe immediately#9

Closed
vivekrnv wants to merge 1 commit intotirupatihemanth:trixiefrom
vivekrnv:trixie
Closed

Update hsFlowd to close the pipe immediately#9
vivekrnv wants to merge 1 commit intotirupatihemanth:trixiefrom
vivekrnv:trixie

Conversation

@vivekrnv
Copy link
Copy Markdown
Collaborator

@vivekrnv vivekrnv commented Oct 21, 2025

Why I did it

sflowmgrd is taking 10-20 mins to finish command service restart hsflowd in Debian 13

As you can see from the hsflowd backtrace, getdtablesize is returning a really huge number and the currently logic loops from ~100M to 0 and close each of fd, Which is taking a lot of time.

Alternatively, close_range can make a single call to close all the fds in the given range which is much faster

Work item tracking
  • Microsoft ADO (number only):

How I did it

sflowmgrd trace

0  read () from /lib/x86_64-linux-gnu/libc.so.6
1  _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6 
2  _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6 
3  _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6 
4  fgets () from /lib/x86_64-linux-gnu/libc.so.6
5  swss::exec(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) () from /lib/x86_64-linux-gnu/libswsscommon.so.0 6  swss::SflowMgr::sflowHandleService (this=this@entry=0x7ffdc4ae6dc0,
    enable=enable@entry=true) at ./cfgmgr/sflowmgr.cpp:67
7  swss::SflowMgr::doTask (this=<optimized out>, consumer=...)
    at ./cfgmgr/sflowmgr.cpp:459
8  Consumer::execute (this=0x556f0715b280) at ../orchagent/orch.cpp:338 9  main (argc=<optimized out>, argv=<optimized out>)
    at ./cfgmgr/sflowmgrd.cpp:74

hsflowd trace:

(gdb) bt
close () from /lib/x86_64-linux-gnu/libc.so.6
main (argc=<optimized out>, argv=<optimized out>) at hsflowd.c:1927 
(gdb) f 1
1927	hsflowd.c: No such file or directory.
(gdb) p i
$1 = 1035943704
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007fa33f48d9e0 in close () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) f 1
1927	in hsflowd.c
(gdb) p i
$2 = 1024299507

During this time, CPU usage for hsflowd is very high

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     48 root      20   0    2652    928    820 R  92.4   0.0   5:24.66 hsflowd

How to verify it

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Without this fix, sflowmgrd is taking 10-20 mins to finish command service restart hsflowd in Debian 13

sflowmgrd trace
0  read () from /lib/x86_64-linux-gnu/libc.so.6
1  _IO_file_underflow () from /lib/x86_64-linux-gnu/libc.so.6
2  _IO_default_uflow () from /lib/x86_64-linux-gnu/libc.so.6
3  _IO_getline_info () from /lib/x86_64-linux-gnu/libc.so.6
4  fgets () from /lib/x86_64-linux-gnu/libc.so.6
5  swss::exec(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) () from /lib/x86_64-linux-gnu/libswsscommon.so.0
6  swss::SflowMgr::sflowHandleService (this=this@entry=0x7ffdc4ae6dc0,
    enable=enable@entry=true) at ./cfgmgr/sflowmgr.cpp:67
7  swss::SflowMgr::doTask (this=<optimized out>, consumer=...)
    at ./cfgmgr/sflowmgr.cpp:459
8  Consumer::execute (this=0x556f0715b280) at ../orchagent/orch.cpp:338
9  main (argc=<optimized out>, argv=<optimized out>)
    at ./cfgmgr/sflowmgrd.cpp:74

hsflowd trace:
(gdb) bt
close () from /lib/x86_64-linux-gnu/libc.so.6
main (argc=<optimized out>, argv=<optimized out>) at hsflowd.c:1927
(gdb) f 1
1927	hsflowd.c: No such file or directory.
(gdb) p i
$1 = 1035943704
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007fa33f48d9e0 in close () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) f 1
1927	in hsflowd.c
(gdb) p i
$2 = 1024299507

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     48 root      20   0    2652    928    820 R  92.4   0.0   5:24.66 hsflowd

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
@vivekrnv vivekrnv closed this Oct 22, 2025
tirupatihemanth pushed a commit that referenced this pull request Mar 13, 2026
…net#25643)

* [build] Add build timing report and dependency analysis tools

Add three scripts for build performance instrumentation:

- scripts/build-timing-report.sh: Parse per-package timing from build
  logs (HEADER/FOOTER timestamps), generate sorted duration table,
  phase breakdown, parallelism timeline, and CSV export.

- scripts/build-dep-graph.py: Parse rules/*.mk dependency graph,
  compute critical path, fan-out/fan-in bottleneck analysis, and
  generate DOT/JSON output for visualization.

- scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O,
  and Docker container count during builds for resource utilization
  analysis.

Add "make build-report" target to slave.mk that runs the timing
report and dependency analysis after a build completes.

Example output from a VS build on 24-core/30GB machine:
- 210 packages built in 53m wall time (173m CPU)
- Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4)
- Critical path: 14 packages deep (libnl -> libswsscommon -> utilities)
- Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>

* Address Copilot review: fix 17 bugs in build analysis scripts

- Use free -m with division instead of free -g to avoid rounding (#1)
- Add = and ?= to Makefile dependency regex patterns (#2, #7)
- CPU calculation now uses /proc/stat delta (two reads) (#3, #14)
- Fix misleading 'critical path estimate' comment (#4)
- Fix parallelism timeline comment (60s not 10s) (#5)
- Include after-relationship packages in fan stats (#6)
- Guard disk I/O division by zero when INTERVAL<=1 (#8)
- Remove unused elapsed_line variable (#9)
- Remove redundant LIBSWSSCOMMON_DBG check (#10)
- Remove active_make_jobs from CSV header comment (#11)
- Wire up _RDEPENDS parsing to build reverse deps (#12)
- Remove unnecessary 'if v' filter on rdeps JSON (#13)
- Remove unused REPORT_FORMAT parameter (#15)
- Add cycle detection to critical path algorithm (#16)
- Add execute permission check for companion scripts (#17)

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>

---------

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Co-authored-by: Rustiqly <rustiqly@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant