Skip to content

Conversation

@Nerivec
Copy link
Collaborator

@Nerivec Nerivec commented Apr 19, 2025

bridge/info

Add the following data:

  • os.version => formatted os.version(), os.release(), os.arch()
  • os.node_version => process.version
  • os.cpus => formatted os.cpus()
  • os.memory_mb => formatted os.totalmem()
  • mqtt.version => the effective protocol version of the MQTT server
  • mqtt.server => the effective URL of the MQTT server

bridge/health

New API endpoint published at configured interval (10 minutes by default).
The reset_on_check config allows to reset all possible stats after every check (instead of keeping a record since start). Currently: all device stats, mqtt.received, mqtt.published.

system stats:

  • response_time => time of message, msec from epoch, UTC
  • os.load_average => os.loadavg() (not supported on Windows)
  • os.memory_used_mb => formatted os.totalmem() - os.freemem()
  • os.memory_percent => formatted os.freemem() / os.totalmem()
  • process.uptime_sec => formatted process.uptime()
  • process.memory_used_mb => formatted process.memoryUsage().rss
  • process.memory_percent => formatted process.memoryUsage().rss / os.totalmem()

mqtt stats:

  • mqtt.connected => whether the MQTT server connection is currently active
  • mqtt.queued => number of messages currently queued
  • mqtt.received => number of received MQTT messages
  • mqtt.published => number of published MQTT messages

device stats:

  • messages => number of messages received
  • messages_per_sec => number of messages received per second
  • leave_count => number of time the device has left the network
  • network_address_changes => number of time the device has changed its network address

@Nerivec Nerivec marked this pull request as ready for review June 12, 2025 18:18
@Koenkk
Copy link
Owner

Koenkk commented Jun 15, 2025

Looks good, can be merged once #27732 is merged, I will prep the docs pr

@Nerivec
Copy link
Collaborator Author

Nerivec commented Jun 15, 2025

The types package is not necessary for this PR to be merged.
I'll wait for the types package to add WindFront support though.

@Koenkk Koenkk merged commit cd9b752 into Koenkk:dev Jun 16, 2025
11 checks passed
@Koenkk
Copy link
Owner

Koenkk commented Jun 16, 2025

Thanks!

@Nerivec Nerivec deleted the feat/health branch June 22, 2025 23:26
@Great-Chart
Copy link

One for @Nerivec I guess?

I've been struggling for an extended period to get my Z2M reliable and whilst I think I have addressed all the issues that seemed to address interference I've since moved onto removing devices that are potentially flooding the network.

This has of course led me to the recently introduced Health tab under Settings and I can see some devices have a message count many times greater than others.
Some examples include Sonoff Basic; Ikea Vallhorn PIR and Samotech corded dimmer - neither of which I'd imagine should be excessively "spammy" devices.

https://www.zigbee2mqtt.io/devices/BASICZBR3.html#sonoff-basiczbr3
https://www.zigbee2mqtt.io/devices/E2134.html#ikea-e2134
https://www.zigbee2mqtt.io/devices/SM325-ZG.html#samotech-sm325-zg

Is it correct to infer that message counts as high as 10,000 in 2-3 days (where battery devices such as temp / humidity sensors have circa 1500 message count) is reflecting these items to be spamming my network?
Or is this a result of the mains devices acting as routers for others and thus passing on messages which is perfectly normal behaviour?

image

I've removed one or two devices that had higher message count(s) and have been monitoring things pending the inevitable devices going offline. I've recently changed some settings (ie backoff) but had previously set timeouts for mains devices to 30 minutes.
https://www.zigbee2mqtt.io/guide/configuration/device-availability.html#advanced-configuration

I'm just not sure on the significance of message counts etc (or aware of any measures to reduce the volume of messages coming from such) in regard to contributing to a more stable network.

In the short term by using the new health tab to gauge quantity of messages and devices tab filtered on last seen I can determine which devices are constantly the most "chatty" but other than removing them and trying alternatives not sure of the clues that are being presented here.

Any pointers as to what is normal / acceptable and what needs to be acted upon?

@Nerivec
Copy link
Collaborator Author

Nerivec commented Aug 4, 2025

Note: please create a discussion next time, instead of asking in a pull request.

You can have very high differences in Messages simply because the device type needs to report more often (e.g. a power meter that reports every 2 seconds can easily have 500 times the number of messages of a temperature sensor during the same period).

To determine spammy devices with WindFront, you should rely on the Messages per sec column. The coloring should give you an approximate idea:

  • green < 0.2
  • regular
  • orange > 1 (may be fine depending on how the rest of the network behaves)
  • red > 3 (should definitely be checked)

You can, for supporting devices, adjust the reporting (see Parameters paragraph) to lower the messages (this can have a massive impact, especially if you don't need some reporting, but unfortunately a lot of lower quality devices don't support this well - or at all).
The processing capabilities of the coordinator will also play a part in how well it handles "stress". If using a not recommended hardware, you may want to upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants