[ledd] Use select() with timeout for AppDB notifications by serhepopovych · Pull Request #16 · sonic-net/sonic-platform-daemons

serhepopovych · 2018-05-26T14:25:28Z

Otherwise ledd ignores signals other than SIGKILL making impossible to
use __del__() destructors in LedClass implementations and delays pmon
docker container shutdown up to 10s.

Here is output from /var/log/supervisor/supervisord.log after
"systemctl stop pmon":

2018-05-26 10:40:36,323 WARN received SIGTERM indicating exit request
2018-05-26 10:40:36,323 INFO waiting for rsyslogd, ledd to die
2018-05-26 10:40:39,327 INFO waiting for rsyslogd, ledd to die
2018-05-26 10:40:42,330 INFO waiting for rsyslogd, ledd to die
2018-05-26 10:40:45,335 INFO waiting for rsyslogd, ledd to die

Note that according to docker-stop(1) default time to wait before retry
with KILL signal is 10s.

Steps to reproduce:

# docker exec -ti pmon bash

# kill -TERM $(pgrep ledd)
# kill -INT $(pgrep ledd)

# kill -0 $(pgrep ledd) && echo 'alive'
alive

# kill -KILL $(pgrep ledd)

Process survives TERM and INT signals, and killed only in 5) by KILL.

Other C++ code already uses SELECT_TIMEOUT = 1000 to return control
into main loop and checks for state.

Signed-off-by: Sergey Popovich [email protected]

pavel-shirshov · 2018-05-29T18:26:24Z

sonic-ledd/scripts/ledd

        if state != swsscommon.Select.OBJECT:
-            log_warning("sel.select() did not return swsscommon.Select.OBJECT")
+            if state != swsscommon.Select.TIMEOUT:
+                log_warning("sel.select() did not return swsscommon.Select.OBJECT")


Can you please add a comment describing why you need the changes? Otherwise it's hard to understand why do we need them.

You mean entire change or just this selected part?

If we didn't check for swsscommon.Select.TIMEOUT log will be flooded with useless messages "did not return swsscommon.Select.OBJECT" for timeout case.

Select will timeout frequently as there will be no event for port.

Previous behavior was to return from select only when port event occurs. That blocks signals preventing ledd from graceful exit from SIGTERM.

My main concern here is that it's not clear from the code why you needed to introduce timeout here?
What was the reason to have timeout. I think it's better to have a comment explaining that we introduced the timeout because of the select blocking UNIX signals.

Then why C++ code that uses timeout does not tell that in comment?

Is it not enough commit message description? Anyone working on code can use standard approach (e.g. git-log(1), git-blame(1), git-describe(1), etc) to find commit and detailed description of the problem.

If we really want to add comment it might be better to do near function call, because checking return serves completely different purpose: skip unwanted logging messages.

I think a simple comment above the select() call on line 209 mentioning that we call select() with a timeout value to to prevent indefinite blocking an enable graceful shutdown via SIGTERM should suffice.

@pavel-shirshov: Do you agree?

I checked sonic-swss repo and I found that all select TIMEOUTs there are used to do some actions.
In the code you introduces it's not clear why do we need TIMEOUTs. I would not by surprised if someone would remove the TIMEOUT code as redundant.
I'm sorry I put my comment on the wrong line in the code.

@jleveque yes, I agree.

Accepted. Thanks.

qiluo-msft · 2018-05-30T21:34:44Z

sonic-ledd/scripts/ledd

+            # Do not flood log when select times out
+            if state != swsscommon.Select.TIMEOUT:
+                log_warning("sel.select() did not return swsscommon.Select.OBJECT")
            continue


Suggest refactoring:

if state == swsscommon.Select.TIMEOUT: continue elif state != swsscommon.Select.OBJECT: log_warning("sel.select() did not return swsscommon.Select.OBJECT") continue ``` #Closed

Accepted. Thanks.

qiluo-msft

As comment.

Otherwise ledd ignores signals other than SIGKILL making impossible to use __del__() destructors in LedClass implementations and delays pmon docker container shutdown up to 10s. Here is output from /var/log/supervisor/supervisord.log after "systemctl stop pmon": 2018-05-26 10:40:36,323 WARN received SIGTERM indicating exit request 2018-05-26 10:40:36,323 INFO waiting for rsyslogd, ledd to die 2018-05-26 10:40:39,327 INFO waiting for rsyslogd, ledd to die 2018-05-26 10:40:42,330 INFO waiting for rsyslogd, ledd to die 2018-05-26 10:40:45,335 INFO waiting for rsyslogd, ledd to die Note that according to docker-stop(1) default time to wait before retry with KILL signal is 10s. Steps to reproduce: # docker exec -ti pmon bash # kill -TERM $(pgrep ledd) # kill -INT $(pgrep ledd) # kill -0 $(pgrep ledd) && echo 'alive' alive # kill -KILL $(pgrep ledd) Process survives TERM and INT signals, and killed only by KILL. Other C++ code already uses SELECT_TIMEOUT = 1000 to return control into main loop and checks for state. Signed-off-by: Sergey Popovich <[email protected]>

This reverts commit 3b1f0ef.

…onic-net#16)

jleveque approved these changes May 26, 2018

View reviewed changes

pavel-shirshov suggested changes May 29, 2018

View reviewed changes

qiluo-msft reviewed May 30, 2018

View reviewed changes

qiluo-msft requested changes May 30, 2018

View reviewed changes

pavel-shirshov approved these changes May 31, 2018

View reviewed changes

qiluo-msft approved these changes May 31, 2018

View reviewed changes

jleveque merged commit ce83d58 into sonic-net:master May 31, 2018

jleveque mentioned this pull request May 31, 2018

[sonic-platform-daemons] Update submodule sonic-net/sonic-buildimage#1754

Merged

jleveque added Enhancement ledd labels Jul 10, 2020

vdahiya12 pushed a commit to vdahiya12/sonic-platform-daemons that referenced this pull request Apr 4, 2022

Revert "Pep 8 compliance, code cleanup (sonic-net#15)" (sonic-net#16)

b8470c5

This reverts commit 3b1f0ef.

vvolam pushed a commit to vvolam/sonic-platform-daemons that referenced this pull request Jun 16, 2025

[xcvrd] Optimize module initialization performance (sonic-net#611) (s…

5016ded

…onic-net#16)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ledd] Use select() with timeout for AppDB notifications#16

[ledd] Use select() with timeout for AppDB notifications#16
jleveque merged 1 commit intosonic-net:masterfrom
OrdnanceNetworks:fix-ledd-signal-handling

serhepopovych commented May 26, 2018

Uh oh!

pavel-shirshov May 29, 2018

Uh oh!

serhepopovych May 29, 2018 •

edited

Loading

Uh oh!

pavel-shirshov May 29, 2018

Uh oh!

serhepopovych May 30, 2018

Uh oh!

jleveque May 30, 2018 •

edited

Loading

Uh oh!

pavel-shirshov May 30, 2018

Uh oh!

pavel-shirshov May 30, 2018

Uh oh!

serhepopovych May 31, 2018

Uh oh!

qiluo-msft May 30, 2018 •

edited

Loading

Uh oh!

serhepopovych May 31, 2018

Uh oh!

qiluo-msft left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

serhepopovych commented May 26, 2018

Uh oh!

pavel-shirshov May 29, 2018

Choose a reason for hiding this comment

Uh oh!

serhepopovych May 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavel-shirshov May 29, 2018

Choose a reason for hiding this comment

Uh oh!

serhepopovych May 30, 2018

Choose a reason for hiding this comment

Uh oh!

jleveque May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavel-shirshov May 30, 2018

Choose a reason for hiding this comment

Uh oh!

pavel-shirshov May 30, 2018

Choose a reason for hiding this comment

Uh oh!

serhepopovych May 31, 2018

Choose a reason for hiding this comment

Uh oh!

qiluo-msft May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhepopovych May 31, 2018

Choose a reason for hiding this comment

Uh oh!

qiluo-msft left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

serhepopovych May 29, 2018 •

edited

Loading

jleveque May 30, 2018 •

edited

Loading

qiluo-msft May 30, 2018 •

edited

Loading