-
Notifications
You must be signed in to change notification settings - Fork 16
Keepalive fixes #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Keepalive fixes #321
+472
−247
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
46b4470 to
1f31bbb
Compare
|
Code coverage summary for 705bfa1: ✅ Region coverage 67% passes |
and use for server's key_update_interval and client's tracer timeout
and add continuous_keepalive as config parameter. Before the commit 3997775, we implicitly use keepalive_interval to enable/disable keepalive. If keepalive_interval is 0s, we do not start the keepalive task. But in 3997775, we have enhanced keepalive module to support continuous and non-continuous keepalive mechanism. Non continuous keepalive was needed when we want to check connection liveliness during network transition. There were two problems in this commit: - noncontinuous keepalive used the interval and timeout value of normal keepalive. So if the interval is 0, non-continuous keepalive does not work since there is no task. - Hardcode continuous_keepalive as true in binary. - In non-continuous mode, if there is a network change, it moved the state to Waiting. Hence keepalive will be sent only after the interval period instead of sending immediately. This commit fixes the first two issues, by using nonzero duration for keepalive_interval and keepalive_timeout and also adding a new variable for continuous_keepalive. The third problem was fixed in e129a70
When try to send Keepalive immediately in Needed state, commit e129a70 introduces a crictial bug where the timeout is reset after sending the Keepalive. This is serious because the timeout value will never be triggered if it is reset continuously. This makes keepalive timeout mechanism fail to work during Tracer timeout or NetworkChange. Commit 44efa8a tried to fix the bug, but instead of fixing the timeout, it introduced another method using failed_attempts counter to fix it. This is redundant since the existing timeout mechanism is supposed to do this precisely. Also it increases the timeout value to timeout * 3(MAX_FAILED_ATTEMPTS). This commit does the following: - reverts the changes related to failed keepalive counter changes - Fixes resetting the timer using FusedFuture implementation FusedFuture enables the use of is_terminated() method on OptionFuture
The tx field in Keepalive struct was always Some and never set to None. Removed the Option wrapper to simplify the code and eliminate unnecessary checks in all methods.
Consider the following case: - Client send keepalive request to server and the reply is lost. Timeout would have been started when client send request. - Now client/server starts exchanging packets continuously triggering OutsideActivity message - While handling OutsideActivity message, we reset the interval and let timeout run. So when there is continuous data exchange, we will not send any keepalive request, but we will wait for Reply which is sent at step 1. After the timeout expires, it will break the connection since it has not received reply. This commit resets the timeout, which does not have this faulty behavior.
This commit does two changes: - Remove the suspended check before setting the status to suspended - In the block where state could be waiting or pending, we move the state to pending if it is Waiting. We could effectively move the state change outside of the if loop. Also, Instead of checking for Waiting state, which implicitly mean no timeout, check the timeout status before setting the timeout.
When there is a network interruption, (mobile) clients will send suspend message to prevent keepalive from disconnecting lightway session. When the network comes back up, it send NetworkChange message to trigger the UDP floating connection. So we need to reset the timeout to start fresh
jsonwebtoken has a breaking change: https://github.com/Keats/jsonwebtoken/blob/master/CHANGELOG.md#1000-2025-09-29 User has to choose now between aws_lc_rs/rust_crypto instead of default ring. `aws_lc_rs` pulls in OpenSsl license which seems to not compatible with AGPL3. So went with rust_crypto.
14e777a to
705bfa1
Compare
xv-thomas-leong
approved these changes
Dec 12, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Refactor keepalive to be consistent/clear and fix tracer implementation for linux client
There are multiple commits, each solving a specific usecase/problem. It is explained in the commit messages.
Motivation and Context
Tracer implementation was not working with Linux clients.
How Has This Been Tested?
Verified connection drops out, when it detects there is a network error and not able to receive packets
Types of changes
Checklist:
main