Skip to content

[202205] Fix error handling when failing to install a deb package (#11846)#12857

Merged
saiarcot895 merged 1 commit intosonic-net:202205from
saiarcot895:202205-fix-deb-install-exit-on-error
Nov 30, 2022
Merged

[202205] Fix error handling when failing to install a deb package (#11846)#12857
saiarcot895 merged 1 commit intosonic-net:202205from
saiarcot895:202205-fix-deb-install-exit-on-error

Conversation

@saiarcot895
Copy link
Contributor

Signed-off-by: Saikrishna Arcot [email protected]

Cherry pick of #11846 to 202205 branch.

Why I did it

The current error handling code for when a deb package fails to be
installed currently has a chain of commands linked together by && and
ends with exit 1. The assumption is that the commands would succeed,
and the last exit 1 would end it with a non-zero return code, thus
fully failing the target and causing the build to stop because of bash's
-e flag.

However, if one of the commands prior to exit 1 returns a non-zero
return code, then bash won't actually treat it as a terminating error.
From bash's man page:

-e      Exit immediately if a pipeline (which may consist of a single simple
	command), a list, or a compound command (see SHELL GRAMMAR above),
        exits with a non-zero status.  The shell does not exit if the
        command that fails is part of the  command  list  immediately
        following a while or until keyword, part of the test following the
        if or elif reserved words, part of any command executed in a && or
        || list except the command following the final && or ||, any
        command in a pipeline but the last, or if the command's return
        value is being inverted with !.  If a compound command other than a
        subshell returns a non-zero status because a command failed while
        -e was being ignored, the shell does not exit.

The part part of any command executed in a && or || list except the command following the final && or || says that if the failing command
is not the exit 1 that we have at the end, then bash doesn't treat it
as an error and exit immediately. Additionally, since this is a compound
command, but isn't in a subshell (subshell are marked by ( and ),
whereas { and } just tells bash to run the commands in the current
environment), bash doesn't exist. The result of this is that in the
deb-install target, if a package installation fails, it may be
infinitely stuck in that while-loop.

This was seen when the snmpd package upgrade happened, and
builds were failing to install the mismatching libsnmp-dev package,
the builds did not immediately terminate; instead, the installation
was retried again and again, suggesting it was stuck in some infinite
loop. The build jobs finally terminated only because of the timeout
specified for the jobs.

How I did it

There are two fixes for this: change to using a subshell, or use ;
instead of &&. Using a subshell would, I think, require exporting any
shell variables used in the subshell, so I chose to change the && to
;. In addition, at the start of the subshell, set +e is added in,
which removes the exit-on-error handling of bash. This makes sure that
all commands are run (the output of which may help for debugging) and
that it still exits with 1, which will then fully fail the target.

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

…1846)

Signed-off-by: Saikrishna Arcot [email protected]

Why I did it
The current error handling code for when a deb package fails to be
installed currently has a chain of commands linked together by && and
ends with exit 1. The assumption is that the commands would succeed,
and the last exit 1 would end it with a non-zero return code, thus
fully failing the target and causing the build to stop because of bash's
-e flag.

However, if one of the commands prior to exit 1 returns a non-zero
return code, then bash won't actually treat it as a terminating error.
From bash's man page:

-e      Exit immediately if a pipeline (which may consist of a single simple
	command), a list, or a compound command (see SHELL GRAMMAR above),
        exits with a non-zero status.  The shell does not exit if the
        command that fails is part of the  command  list  immediately
        following a while or until keyword, part of the test following the
        if or elif reserved words, part of any command executed in a && or
        || list except the command following the final && or ||, any
        command in a pipeline but the last, or if the command's return
        value is being inverted with !.  If a compound command other than a
        subshell returns a non-zero status because a command failed while
        -e was being ignored, the shell does not exit.
The part part of any command executed in a && or || list except the command following the final && or || says that if the failing command
is not the exit 1 that we have at the end, then bash doesn't treat it
as an error and exit immediately. Additionally, since this is a compound
command, but isn't in a subshell (subshell are marked by ( and ),
whereas { and } just tells bash to run the commands in the current
environment), bash doesn't exist. The result of this is that in the
deb-install target, if a package installation fails, it may be
infinitely stuck in that while-loop.

This was seen when the snmpd package upgrade happened, and
builds were failing to install the mismatching libsnmp-dev package,
the builds did not immediately terminate; instead, the installation
was retried again and again, suggesting it was stuck in some infinite
loop. The build jobs finally terminated only because of the timeout
specified for the jobs.

How I did it
There are two fixes for this: change to using a subshell, or use ;
instead of &&. Using a subshell would, I think, require exporting any
shell variables used in the subshell, so I chose to change the && to
;. In addition, at the start of the subshell, set +e is added in,
which removes the exit-on-error handling of bash. This makes sure that
all commands are run (the output of which may help for debugging) and
that it still exits with 1, which will then fully fail the target.

How to verify it
@saiarcot895 saiarcot895 merged commit 5c7c789 into sonic-net:202205 Nov 30, 2022
@saiarcot895 saiarcot895 deleted the 202205-fix-deb-install-exit-on-error branch November 30, 2022 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants