Skip to content

Conversation

@Nerivec
Copy link
Collaborator

@Nerivec Nerivec commented Jan 8, 2026

Design changes

  • [ZH/ZHC/Z2M/WindFront] OTA refactor #1608
  • Use separate util class for OTA session
    • Use AsyncGenerator to handle block/page/end requests
    • Rework progress updates: initial will send estimate with chunks stats, and after 30sec, will start using calculation from current offset
  • Use check function inside update for consistency (no duplicate logic)
  • check and update return types have changed to adapt to new features
  • If check is called without queryNextImageRequest payload (undefined), automatically notify the device to trigger it then reply NO_IMAGE_AVAILABLE (fallback to prevent device from requesting again)
  • If update is called without queryNextImageRequest payload (undefined), automatically notify the device to trigger it, then reply as appropriate based on image availability (which will make the device start OTA update or not).
  • Prevent running update while one is running
  • Data settings handled per update trigger instead of globally
  • Remove public Endpoint.waitForCommand API that was solely intended for OTA use, now handled internally
  • Logging changed quite a bit to match rest of ZH logging / new features

Logic changes & co can be derived from list of below tests.

CI Tests

  • Checks
    • returns no image when index is empty
    • performs notify flow when current payload is missing
    • matches OTA images using extra metadata overrides
    • matches OTA images using manufacturer name and hardware ranges
    • matches OTA images using extra manufacturer metadata when device manufacturer name mismatches
    • handles failures when waiting for OTA notify response
    • fails when remote index fetching fails
    • merges override index (prio) with URL index
    • auto-adds meta for override index local firmware when unspecified
    • allows use of remote override index even if URL index fetching fails
    • allows use of local override index even if default URL index fetching fails
    • allows use of local override index even if custom URL index fetching fails
    • overrides lumi file version with meta and reports upgrade availability
    • writes scenes group before checking OTA for PP-WHT-US
  • Updates
    • applies an upgrade image end-to-end (using several sample images)
    • applies a downgrade image end-to-end
    • considers an upgrade successful even if no device announce
    • handles receiving non-value for data size
    • handles out-of-order block offsets
    • handles negative misaligned block offset - resends portion already sent
    • handles positive misaligned block offset - skips portion
    • handles page requests
    • applies manufacturer-specific timeout
    • applies manufacturer-specific and imageType-specific timeout
    • applies manufacturer-specific data size
    • handles URL as source URL
    • handles filesystem as source URL
    • applies fallback timeouts when manufacturer is unknown
    • calls onProgress with round blocks
    • calls onProgress with response delay (changes estimate)
    • calls onProgress with non-round blocks
    • throttles when response delay is configured
    • cancels scheduled OTA when completed
    • keeps scheduled OTA on failure
    • uses custom firmware in dataDir
    • recovers from image block response failures
    • prevents running two OTAs on same device
    • fails when check fails
    • fails when device stops sending data requests
    • fails when upgrade end never arrives
    • fails when device sends upgrade end request with INVALID_IMAGE after image fully sent
    • fails when device sends upgrade end request with ABORT after a certain number of blocks
    • fails when image notify fails
    • fails when upgrade end response fails
    • fails when query next image response fails
    • does not throw on failed default response after non-success upgrade end request
    • allows bypassing version check with force meta
    • returns NO_IMAGE_AVAILABLE when device version matches available in repo
    • returns NO_IMAGE_AVAILABLE when device version is above available in repo (upgrade)
    • returns NO_IMAGE_AVAILABLE when device version is below available in repo (downgrade)
    • returns NO_IMAGE_AVAILABLE when device version matches available at given URL
    • returns NO_IMAGE_AVAILABLE when parsing repo firmware fails
    • returns NO_IMAGE_AVAILABLE when parsing custom firmware fails
    • returns NO_IMAGE_AVAILABLE when parsing custom firmware in dataDir fails
    • returns NO_IMAGE_AVAILABLE when firmware fetching fails
    • returns NO_IMAGE_AVAILABLE when custom file fails checksum
    • returns NO_IMAGE_AVAILABLE when index has no entries
  • Schedules / Unschedules
    • schedules OTA request
    • replaces previously-scheduled OTA request
    • unschedules OTA when present
    • handles unscheduling when scheduled not present
  • Finds image
    • finds match by spec (using several sample images)
    • within file version range
    • by minFileVersion
    • by maxFileVersion
    • by modelId
    • by extra meta modelId
    • by manufacturerName
    • by extra meta manufacturerName
    • by extra meta otaHeaderString
    • by hardwareVersionMin
    • by extra meta hardwareVersionMin
    • by hardwareVersionMax
    • by extra meta hardwareVersionMax
  • Utils
    • parses firmware with unusual header fields

CC: if you want to review this @andrei-lazarov @sjorge @burmistrzak 😉

@Nerivec Nerivec mentioned this pull request Jan 8, 2026
5 tasks
@Koenkk Koenkk changed the title feat: OTA refactor feat!: OTA refactor Jan 12, 2026
@Nerivec
Copy link
Collaborator Author

Nerivec commented Jan 12, 2026

I'm not able to currently test this with real devices (no OTA bench setup)...

@andrei-lazarov might you be able to test a few OTA scenario, confirm the logic works, and also covers what's needed from the point of view of custom firmware providers?

Frontend support for the new stuff isn't done yet, will have to use MQTT requests for now.
You need to use the appropriate Z2M PR branch, coupled with this ZH branch.

git clone --depth 1 -b ota-refactor https://github.com/Koenkk/zigbee2mqtt
git clone --depth 1 -b ota-refactor https://github.com/Koenkk/zigbee-herdsman
cd zigbee-herdsman
pnpm i --frozen-lockfile
pnpm run prepack
cd ../zigbee2mqtt
pnpm i --frozen-lockfile
pnpm link ../zigbee-herdsman
pnpm run prepack

Note: make sure you see the link in the console logs at step pnpm link, sometimes it doesn't work well on first try (just re-do it til it does).


If you see any behavior that current CI tests are not covering, let me know.

@andrei-lazarov
Copy link

andrei-lazarov commented Jan 13, 2026

I quickly tested an OTA without index: ota_update/update: {url: path_to_firmware}.
CC2531 and Telink TLSR8258 custom fw
It went well! Very exciting 😁

The initial time estimate was wrong. At the end it automatically interviewed, but didn't reconfigure.

Progress
[2026-01-13 02:09:12] info: 	z2m: OTA updating 'dev' to latest firmware
[2026-01-13 02:09:12] info: 	zh:controller:ota: OTA update of '0xa4c1381463c32dff' estimated at 56400 seconds (2820 chunks, 0.05 per second)
[2026-01-13 02:09:45] info: 	zh:controller:ota: OTA update of '0xa4c1381463c32dff' at 18.52%, 145 seconds remaining
[2026-01-13 02:10:15] info: 	zh:controller:ota: OTA update of '0xa4c1381463c32dff' at 37.11%, 107 seconds remaining
[2026-01-13 02:10:45] info: 	zh:controller:ota: OTA update of '0xa4c1381463c32dff' at 55.7%, 74 seconds remaining
[2026-01-13 02:11:15] info: 	zh:controller:ota: OTA update of '0xa4c1381463c32dff' at 74.29%, 43 seconds remaining
[2026-01-13 02:11:45] info: 	zh:controller:ota: OTA update of '0xa4c1381463c32dff' at 92.78%, 12 seconds remaining
[2026-01-13 02:11:57] info: 	zh:controller:device: Update of 0xa4c1381463c32dff successful (165 seconds). Waiting for device announce...
[2026-01-13 02:12:52] info: 	z2m: Finished update of 'dev'
[2026-01-13 02:12:52] info: 	z2m: Device 'dev' was OTA updated from '285356032' to '4294967295'
[2026-01-13 02:12:52] info: 	z2m: Interviewing 'dev'
[2026-01-13 02:12:53] info: 	zh:controller:device: Device '0xa4c1381463c32dff' is only compliant to revision '21' of the Zigbee specification (current revision: 23).
[2026-01-13 02:13:03] info: 	z2m: Successfully interviewed 'dev'

This is quite a lot for me to digest, but I can test more scenarios

@Nerivec
Copy link
Collaborator Author

Nerivec commented Jan 13, 2026

It worked! That's always a good start 😁
Estimate should be fixed (though unlikely to be very precise, too many factors involved).
Reconfigure is a problem at Z2M layer, will look into it.

cd zigbee-herdsman
git pull
pnpm run prepack

You should be able to use the url as:

  • URL of index JSON (with usual format), will proceed with same detection as if it was the official index URL.
  • URL of firmware file
  • absolute path to file system
  • relative path to data dir

For hex option, you can provide the format "hex": {"data": "FFFF...", "file_name": "dev-firmware-1.ota"} where FFFF... is the full firmware file in hex form (file_name is optional). Of course, this is intended for frontend use, where this conversion will be abstracted away after a file is "uploaded". What Z2M will do upon reception is write that hex as an actual file in the data dir, then use it.

You should also be able to override the various configs per OTA update call:

image_block_request_timeout?: number;
image_block_response_delay?: number;
default_maximum_data_size?: number;

They will take precedence over the Z2M defaults, and your configuration.yaml (if any).

As for schedule, it should now be remembered across Z2M restarts (saved in database), and also supports url/hex per above.

@Nerivec
Copy link
Collaborator Author

Nerivec commented Jan 22, 2026

@Koenkk do you want to do a final review of all 4 PRs, and start merging this? I think I've covered pretty much all, just docs remaining.
We're closer to Feb now, not sure if we wait. It's not a critical aspect of the code (networks are fine without it), but still, some bugs could have slipped in 😅
We're likely to need wide-feedback in any case to find bugs. Too many scenario, too many devices, as always 😁
Plus the frontend part is rather critical for proper testing on this one (for the new features anyway).

Sidenote: we should put a small notice in whatever release this gets into, that the logging regarding OTA has changed substantially (just in case anyone used it for some kind of integration, or had filtered some stuff which could make it appear/disappear now).

@Koenkk
Copy link
Owner

Koenkk commented Jan 23, 2026

We're closer to Feb now, not sure if we wait. It's not a critical aspect of the code (networks are fine without it), but still, some bugs could have slipped in 😅

I would propose to merge it, it's not critical and allows to get feedback 1 month earlier. So ready to go, really nice work! ❤️

Feel free to take out of draft and slam merge buttons.

@Nerivec Nerivec marked this pull request as ready for review January 23, 2026 22:38
@Nerivec Nerivec merged commit 1ec439c into master Jan 23, 2026
4 checks passed
@Nerivec Nerivec deleted the ota-refactor branch January 23, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants