Skip to content

Dev sysready extension#13

Closed
fastiuk wants to merge 12 commits intomaster-sysready-extensionfrom
dev-sysready-extension
Closed

Dev sysready extension#13
fastiuk wants to merge 12 commits intomaster-sysready-extensionfrom
dev-sysready-extension

Conversation

@fastiuk
Copy link
Owner

@fastiuk fastiuk commented Feb 5, 2024

Why I did it

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
{%- if feature in ["bgp"] %}
"check_up_status" : "false",
{%- endif %}
{%- if feature in ["ib-utils", "snmp"] %}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove ib-utils

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove ib-utils as you commented

sysmon = Sysmonitor()
sysmon.publish_system_status('UP')
sysmon.monitor_timeout = MagicMock()
# sysmon.publish_system_status('UP')
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

assert call_args in expected_calls

@patch('health_checker.sysmonitor.Sysmonitor.print_console_message', MagicMock())
# @patch('health_checker.sysmonitor.Sysmonitor.post_system_status', MagicMock())
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

# from DB. Read timeout from config file and add two extra minutes on top of
# it.
CONFFILE=system_health_monitoring_config.json
PLATFORM=$(sonic-cfggen -d -v "DEVICE_METADATA['localhost']['platform']")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use cache rather than using sonic-cfggen as sonic-cfggen is a costly operation.

Please refer to https://github.com/sonic-net/sonic-buildimage/pull/17343

TIMEOUT=10
fi

# Add to extra minutes and convert to seconds and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale behind adding extra two minutes?

fi
{%- elif docker_container_name == "snmp" %}
$SONIC_DB_CLI STATE_DB HSET 'DEVICE_METADATA|localhost' chassis_serial_number $(decode-syseeprom -s)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you enabling this feature by default for snmp? Shouldn't it be based on configuration?

{%- if feature in ["bgp"] %}
"check_up_status" : "false",
{%- endif %}
{%- if feature in ["ib-utils", "snmp"] %}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove ib-utils as you commented

/usr/bin/mlnx-fw-upgrade.sh -v
if [[ "$?" -ne "${EXIT_SUCCESS}" ]]; then
debug "Failed to upgrade fw. " "$?" "Restart syncd"
sonic-db-cli STATE_DB HSET "FEATURE|$DEV_SRV" fail_reason \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we considering just the asic firmware update as sysready indication for syncd. Shouldn't it be the create switch success?

REDIS_TIMEOUT_MS = 0
system_allsrv_state = "DOWN"
spl_srv_list = ['database-chassis', 'gbsyncd']
spl_srv_list = ['database-chassis', 'gbsyncd', 'e2scrub_reap']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what is e2scrub_reap?


# Subprocess to monitor system ready timeout. If timeout will be exceeded,
# send a message to queue and exit
class MonitorTimeout(ProcessTaskBase):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread needs to be spawned only when feature is enabled.

@fastiuk fastiuk closed this Apr 28, 2024
fastiuk pushed a commit that referenced this pull request Dec 23, 2024
…et#21095)

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
sonic-net#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants