-
-
Notifications
You must be signed in to change notification settings - Fork 71
fix: RTCDataChannel readyState race condition and improve typings #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/polyfill/RTCPeerConnection.ts
Outdated
| close(): void { | ||
| for (const dc of this.#dataChannels) { | ||
| if (dc.readyState !== 'closed') { | ||
| dc.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.w3.org/TR/webrtc/#dom-rtcpeerconnection-close
- Set the [[ReadyState]] slot of each of connection's RTCDataChannels to "closed".
NOTE
The RTCDataChannels will be closed abruptly and the closing procedure will not be invoked.
Do we actually need a separate method like forceCloseAbruptly for this case?
In the current PR DataChannel.close() implementation, the .readyState is set to "closing" immediately to prevent readyState race conditions, and only transitions to "closed" once the native layer confirms that the data channel has closed. However, according to the W3C WebRTC Data Channel specification, when a data channel is closed abruptly, the closing procedure is not invoked and .readyState must transition directly to "closed".
We should ensure that our implementation matches the expected behavior defined by the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@murat-dogan I need your insights here.
Also, could you clarify why this implementation was removed in the first place?
b3000f1#diff-701dfe9764f8864403783f0e275aeac5f8f104601ac6b189f7deda2f9eaaa1f1L339-L340
|
Hello @mertushka, |
| if (!this.#closeRequested) { | ||
| this.#readyState = 'closing'; | ||
| this.dispatchEvent(new Event('closing')); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic was incorrect. Why would we simulate a "closing" event when the data channel is already closed? Instead, we start the "closing" state at the beginning of the <RTCDataChannel>.close() method, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a gap in the libdatachannel API. Step 4 of 6.2.4 says:
Unless the procedure was initiated by channel.close, set channel.[[ReadyState]] to "closing" and fire an event named closing at channel.
We don't get a "closing" notification from libdatachannel before the channel is actually closed so this is the best we can do in order to emit the events required by the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how are we gonna handle both cases?
Step 5 of https://www.w3.org/TR/webrtc/#dom-rtcpeerconnection-close
NOTE
The RTCDataChannels will be closed abruptly and the closing procedure will not be invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked case is when the RTCPeerConnection is explicitly closed by the user so set a flag or a status and use that to decide whether or not to emit the event(s)?
| setImmediate(() => { | ||
| this.#readyState = 'closed'; | ||
| this.dispatchEvent(new Event('close')); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to use setImmediate here. This can be done synchronously since the data channel already closed.
b83de7a to
2168c78
Compare
|
Hello @mertushka, The close operation will be in another thread. What we can do here to catch the error from the wrapper and re-send it.
|
I actually implemented that in the beginning. Please check: But i eventually removed it because it will be hell of crashes for user side and its not really good. Also, checking for readyState before sending messages in native browser WebRTC just works, but in our case we can't keep up with the readyState without being caught in a race condition window. In this PR, i am trying to minimize it with keeping up with spec instead of simulating. You can run the Test Case in both browser and node to see the difference, also before and after fix. |
|
Actually, I don't think this is a race condition.
|
|
@murat-dogan Just to clarify, the test case in the issue is specifically about the RTCDataChannel state behavior during RTCPeerConnection.close() so it’s focused solely on the closing procedure triggered by peerConnection.close(), not cases like network failures or unexpected disconnects. Also i still don’t think that test will fail on Chrome. |
|
As I said, if there are some points we can improve, of course, we can. But this part makes compilations. So we need to delete it. If you are not closing explicitly the data channel (I am not talking about peer-connection), then remote or local does not matter, the close event will come from libdatachannel. If you delete this part, what will be the new situation for your use case? |
|
Why do we need to delete that part? The WebRTC spec clearly states that all RTCDataChannels should be closed before the RTCPeerConnection is closed (in our case; just set their readyState to “closed” native layer will do the rest). If we skip this and rely only on the native layer, we introduce a timing issue. The data channels are already closed internally, but readyState may still report "open" briefly. That results in this situation again: If the reasoning is that the native layer handles closure automatically, I get that. But removing this code causes a mismatch in state reporting and introduces latency in reflecting the correct readyState. This logic is not about duplicating what the native layer does, it is about keeping our internal state consistent and protecting developers from sending on a channel that appears to be open but is already closed underneath. If you have another approach to keep the states in sync without manually calling dc.close(), I am open to that. But i don’t get why it’s needed to removed and why you removed it in the first place. |
|
paullouisageneau/libdatachannel#1384 (comment) Because the close operation is asynchronous. If we try to close the data channels and call peer.close it creates race conditions. You think, this case only happens for the local close operation? So not remote close. Rİght? |
Then we need to think a way to set data channels readyState’s to “closed” on peerConnection.close(). The native layer will do the rest?
Yes, since we get the remote close via libdatachannel, we can’t do really much about it. |
|
Then instead of closing datachannels you can have a function that will set ready states to closing before calling peer close function. But for me it does not make too much sense the diff between local and remote close. |
|
@murat-dogan What do you think about this..? |
|
I appreciate your effort. While testing, I see this failure message as much as a ready state failure. Which is completely normal, I mean, can occur. The test is suddenly closing a peer connection. Again, I think for all cases best solution is to guard the send function with a try-catch. My suggestion will be;
|
80a3413 to
3ad0385
Compare
Apologies, had to close PR #358 because I mistakenly committed to
masterin my fork instead of creating a separate branch for the PR.As identified here: #326 (comment) and mertushka/haxball.js#64 (comment)
This fixes a race condition where
<RTCDataChannel>.readyStatecould remain stuck in 'open' even after the underlying native DataChannel was closed. It also improves typings a bit.Test Case
Before Fix
After Fix