Skip to content

Conversation

@mertushka
Copy link
Contributor

@mertushka mertushka commented Jun 16, 2025

Apologies, had to close PR #358 because I mistakenly committed to master in my fork instead of creating a separate branch for the PR.

As identified here: #326 (comment) and mertushka/haxball.js#64 (comment)

This fixes a race condition where <RTCDataChannel>.readyState could remain stuck in 'open' even after the underlying native DataChannel was closed. It also improves typings a bit.

Test Case

  • Tested on both Node.JS 18.x and 22.X LTS
import { RTCPeerConnection } from "node-datachannel/polyfill"; // "@mertushka/node-datachannel/polyfill" includes fixes

const peer1 = new RTCPeerConnection();
const peer2 = new RTCPeerConnection();

peer2.addEventListener("datachannel", (e) => {
  const channel = e.channel;
  channel.binaryType = "arraybuffer";
  channel.addEventListener("message", (evt) => {
    // echo back
    if (channel.readyState === "open") channel.send(evt.data);
  });
});

const channel = peer1.createDataChannel("race");
await connectPeers(peer1, peer2);
await waitForOpen(channel);

let tick = 0;
let sent = 0;
let interval;

channel.binaryType = "arraybuffer";

// Start sending every tick
interval = setInterval(() => {
  console.log("readyState: ", channel.readyState);
  if (channel.readyState === "open") {
    try {
      channel.send(new Uint8Array(128));
      sent++;
    } catch (err) {
      console.error(`Tick ${tick}: Send failed despite readyState === "open"`);
      console.error("Error:", err.message);
      clearInterval(interval);
      cleanup();
    }
  } else {
    console.warn(`Tick ${tick}: Not open, readyState is`, channel.readyState);
    clearInterval(interval);
    cleanup();
  }

  tick++;

  // Simulate sudden closure of peer2 after a few ticks
  if (tick === 200) {
    console.warn(">> Closing peer2 abruptly");
    peer2.close();
  }
}, 1000 / 60);

function cleanup() {
  console.log(`Sent ${sent} messages before failure`);
  peer1.close();
}

async function connectPeers(peer1, peer2) {
  const offer = await peer1.createOffer();
  await peer2.setRemoteDescription(offer);

  const answer = await peer2.createAnswer();
  await peer1.setRemoteDescription(answer);

  peer1.addEventListener("icecandidate", (e) => {
    peer2.addIceCandidate(e.candidate);
  });

  peer2.addEventListener("icecandidate", (e) => {
    peer1.addIceCandidate(e.candidate);
  });

  await Promise.all([waitForConnection(peer1), waitForConnection(peer2)]);
}

function waitForConnection(peer) {
  return new Promise((resolve, reject) => {
    peer.addEventListener("connectionstatechange", () => {
      if (peer.connectionState === "connected") resolve();
      if (peer.connectionState === "failed") reject(new Error("ICE failed"));
    });
  });
}

function waitForOpen(dc) {
  return new Promise((resolve, reject) => {
    if (dc.readyState === "open") return resolve();
    dc.addEventListener("open", () => resolve());
    dc.addEventListener("error", () => reject(new Error("DC failed")));
  });
}

Before Fix

readyState: open
>> Closing peer2 abruptly
readyState: open
❌ Error: libdatachannel error while sending data channel message: DataChannel is closed

After Fix

readyState: open
>> Closing peer2 abruptly
readyState: closed
✅ Tick 200: Not open, readyState is closed

@mertushka mertushka marked this pull request as draft June 17, 2025 14:17
mertushka

This comment was marked as duplicate.

close(): void {
for (const dc of this.#dataChannels) {
if (dc.readyState !== 'closed') {
dc.close();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.w3.org/TR/webrtc/#dom-rtcpeerconnection-close

  1. Set the [[ReadyState]] slot of each of connection's RTCDataChannels to "closed".

NOTE

The RTCDataChannels will be closed abruptly and the closing procedure will not be invoked.

Do we actually need a separate method like forceCloseAbruptly for this case?

In the current PR DataChannel.close() implementation, the .readyState is set to "closing" immediately to prevent readyState race conditions, and only transitions to "closed" once the native layer confirms that the data channel has closed. However, according to the W3C WebRTC Data Channel specification, when a data channel is closed abruptly, the closing procedure is not invoked and .readyState must transition directly to "closed".

We should ensure that our implementation matches the expected behavior defined by the spec.

Copy link
Contributor Author

@mertushka mertushka Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@murat-dogan I need your insights here.

Also, could you clarify why this implementation was removed in the first place?
b3000f1#diff-701dfe9764f8864403783f0e275aeac5f8f104601ac6b189f7deda2f9eaaa1f1L339-L340

@murat-dogan
Copy link
Owner

Hello @mertushka,
Thank you for the PR
I will check this all at weekend.

if (!this.#closeRequested) {
this.#readyState = 'closing';
this.dispatchEvent(new Event('closing'));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was incorrect. Why would we simulate a "closing" event when the data channel is already closed? Instead, we start the "closing" state at the beginning of the <RTCDataChannel>.close() method, not here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a gap in the libdatachannel API. Step 4 of 6.2.4 says:

Unless the procedure was initiated by channel.close, set channel.[[ReadyState]] to "closing" and fire an event named closing at channel.

We don't get a "closing" notification from libdatachannel before the channel is actually closed so this is the best we can do in order to emit the events required by the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how are we gonna handle both cases?

Step 5 of https://www.w3.org/TR/webrtc/#dom-rtcpeerconnection-close

NOTE

The RTCDataChannels will be closed abruptly and the closing procedure will not be invoked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked case is when the RTCPeerConnection is explicitly closed by the user so set a flag or a status and use that to decide whether or not to emit the event(s)?

setImmediate(() => {
this.#readyState = 'closed';
this.dispatchEvent(new Event('close'));
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to use setImmediate here. This can be done synchronously since the data channel already closed.

@murat-dogan
Copy link
Owner

Hello @mertushka,
Thank you for the PR.
But I am not sure if I get the point correctly.
close event is asynchronous, so we can not guarantee that it will always work;

if (channel.readyState === "open") channel.send(evt.data);

The close operation will be in another thread.
The send function should always be in a try-catch block.

What we can do here to catch the error from the wrapper and re-send it.
Please check here.
dc0ecbc

closing event is another story. If we have any problems there, we can talk about it.

@mertushka
Copy link
Contributor Author

@murat-dogan

What we can do here to catch the error from the wrapper and re-send it.

I actually implemented that in the beginning.

Please check:
13dd568#diff-48547965b743e7779b7d00b6598c290d3460ec4078725fae146e87dfe0d937ffR190

But i eventually removed it because it will be hell of crashes for user side and its not really good. Also, checking for readyState before sending messages in native browser WebRTC just works, but in our case we can't keep up with the readyState without being caught in a race condition window. In this PR, i am trying to minimize it with keeping up with spec instead of simulating. You can run the Test Case in both browser and node to see the difference, also before and after fix.

@murat-dogan
Copy link
Owner

Actually, I don't think this is a race condition.

if (channel.readyState === "open") channel.send(evt.data);

readyState could be changed while entering the if statement.
Yes, you are right, there could be some improvements that will make it rare. But still will happen.
Additionally, for Chrome, I think that if you try it enough, it will happen.

@mertushka
Copy link
Contributor Author

@murat-dogan Just to clarify, the test case in the issue is specifically about the RTCDataChannel state behavior during RTCPeerConnection.close() so it’s focused solely on the closing procedure triggered by peerConnection.close(), not cases like network failures or unexpected disconnects. Also i still don’t think that test will fail on Chrome.

@murat-dogan
Copy link
Owner

As I said, if there are some points we can improve, of course, we can.

But this part makes compilations. So we need to delete it.

 close(): void {
    for (const dc of this.#dataChannels) {
      if (dc.readyState !== 'closed' && dc.readyState !== 'closing') {
        dc.close();

If you are not closing explicitly the data channel (I am not talking about peer-connection), then remote or local does not matter, the close event will come from libdatachannel.

If you delete this part, what will be the new situation for your use case?

@mertushka
Copy link
Contributor Author

mertushka commented Jun 21, 2025

@murat-dogan

Why do we need to delete that part?

The WebRTC spec clearly states that all RTCDataChannels should be closed before the RTCPeerConnection is closed (in our case; just set their readyState to “closed” native layer will do the rest). If we skip this and rely only on the native layer, we introduce a timing issue. The data channels are already closed internally, but readyState may still report "open" briefly.

That results in this situation again:

readyState: open
>> Closing peer2 abruptly
readyState: open
❌ Error: libdatachannel error while sending data channel message: DataChannel is closed

If the reasoning is that the native layer handles closure automatically, I get that. But removing this code causes a mismatch in state reporting and introduces latency in reflecting the correct readyState.

This logic is not about duplicating what the native layer does, it is about keeping our internal state consistent and protecting developers from sending on a channel that appears to be open but is already closed underneath.

If you have another approach to keep the states in sync without manually calling dc.close(), I am open to that. But i don’t get why it’s needed to removed and why you removed it in the first place.

@murat-dogan
Copy link
Owner

paullouisageneau/libdatachannel#1384 (comment)

Because the close operation is asynchronous. If we try to close the data channels and call peer.close it creates race conditions.

You think, this case only happens for the local close operation? So not remote close. Rİght?

@mertushka
Copy link
Contributor Author

Because the close operation is asynchronous. If we try to close the data channels and call peer.close it creates race conditions.

Then we need to think a way to set data channels readyState’s to “closed” on peerConnection.close(). The native layer will do the rest?

You think, this case only happens for the local close operation? So not remote close. Rİght?

Yes, since we get the remote close via libdatachannel, we can’t do really much about it.

@murat-dogan
Copy link
Owner

Then instead of closing datachannels you can have a function that will set ready states to closing before calling peer close function.

But for me it does not make too much sense the diff between local and remote close.
Because in both cases dataxhannel close coming from lib.

@mertushka
Copy link
Contributor Author

@murat-dogan What do you think about this..?

0081313

@murat-dogan
Copy link
Owner

I appreciate your effort.
However, I'm sorry, but it's too dirty like that.

While testing, I see this failure message as much as a ready state failure.

Error: libdatachannel error while sending data channel message: Sending failed, errno=2

Which is completely normal, I mean, can occur.

The test is suddenly closing a peer connection.
If you even improve the logic for the programmatically closing the peer connection (like closing the dcs before closing the peer), how will this help real problems, like a network failure, or peer close?

Again, I think for all cases best solution is to guard the send function with a try-catch.
And I don't understand why this is not a solution for you.
If send fails somehow, you will get an error, just like this.

My suggestion will be;

  • For your use case, guard the send function. If you don't want to send failures to user then have a wrapper for send.
  • If we have a logic error for closing,closed events, lets talk and fix them
  • Create another PR for type improvements.

@murat-dogan murat-dogan force-pushed the master branch 2 times, most recently from 80a3413 to 3ad0385 Compare August 1, 2025 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants