Skip to content

Conversation

@murat-dogan
Copy link
Contributor

Sometimes resetting close callback before calling it.
Related issue:
murat-dogan/node-datachannel#349

I have modified this code part like this, in order to catch the race condition:

void DataChannel::close() {
	PLOG_ERROR << "Closing DataChannel";

	shared_ptr<SctpTransport> transport;
	{
		std::shared_lock lock(mMutex);
		transport = mSctpTransport.lock();
	}

	PLOG_ERROR << "mIsClosed: " << mIsClosed << " closedCallback: " << (bool)closedCallback;
	if (!mIsClosed.exchange(true)) {
		if (transport && mStream.has_value())
			transport->closeStream(mStream.value());

		triggerClosed();
	}

	PLOG_ERROR << "calling resetCallbacks";
	resetCallbacks();
}

This is a normal call, without race condition:

2025-05-03 17:15:12.552 ERROR [1053375] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.552 ERROR [1053375] [rtc::impl::DataChannel::close@106] mIsClosed: 0 closedCallback: 1
2025-05-03 17:15:12.552 ERROR [1053375] [DataChannelWrapper::onClosed@364] onClosed cb received from rtc: c-0
2025-05-03 17:15:12.552 ERROR [1053375] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-03 17:15:12.552 ERROR [1053375] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.552 ERROR [1053375] [rtc::impl::DataChannel::close@106] mIsClosed: 1 closedCallback: 0
2025-05-03 17:15:12.552 ERROR [1053375] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-03 17:15:12.552 ERROR [1053393] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.553 ERROR [1053393] [rtc::impl::DataChannel::close@106] mIsClosed: 0 closedCallback: 1
2025-05-03 17:15:12.553 ERROR [1053393] [DataChannelWrapper::onClosed@364] onClosed cb received from rtc: c-0
2025-05-03 17:15:12.553 ERROR [1053393] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-03 17:15:12.553 ERROR [1053375] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.553 ERROR [1053375] [rtc::impl::DataChannel::close@106] mIsClosed: 1 closedCallback: 0
2025-05-03 17:15:12.553 ERROR [1053375] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-03 17:15:12.553 ERROR [1053375] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.553 ERROR [1053375] [rtc::impl::DataChannel::close@106] mIsClosed: 1 closedCallback: 0
2025-05-03 17:15:12.553 ERROR [1053375] [rtc::impl::DataChannel::close@114] calling resetCallbacks

This is the output when a race condition occurs:

2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@106] mIsClosed: 0 closedCallback: 1
2025-05-03 17:15:12.729 ERROR [1053398] [DataChannelWrapper::onClosed@364] onClosed cb received from rtc: c-0
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@106] mIsClosed: 1 closedCallback: 0
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@114] calling resetCallbacks
// Thread 1053412 entering close function
2025-05-03 17:15:12.729 ERROR [1053412] [rtc::impl::DataChannel::close@98] Closing DataChannel
// mIsClosed is 0 and closedCallback set
2025-05-03 17:15:12.729 ERROR [1053412] [rtc::impl::DataChannel::close@106] mIsClosed: 0 closedCallback: 1
// Thread 1053398 entering close function
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@98] Closing DataChannel
// mIsClosed is 1 (set by 1053412) and closedCallback is still set
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@106] mIsClosed: 1 closedCallback: 1
// Thread 1053398 resetting callbacks before 1053412 call
2025-05-03 17:15:12.729 ERROR [1053398] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-03 17:15:12.729 ERROR [1053412] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-03 17:15:12.729 ERROR [1053412] [rtc::impl::DataChannel::close@98] Closing DataChannel
2025-05-03 17:15:12.729 ERROR [1053412] [rtc::impl::DataChannel::close@106] mIsClosed: 1 closedCallback: 0
2025-05-03 17:15:12.729 ERROR [1053412] [rtc::impl::DataChannel::close@114] calling resetCallbacks

-- channels
{
  "incoming-c-0": "open"
}

There are 4 handle(s) keeping the process running.

# TTYWRAP
index.js:7 - process.stdout.write('.....\n.....\n');

# ThreadSafeCallback callback
/home/murat/js/node-datachannel/dist/esm/polyfill/RTCDataChannel.mjs:45    - __privateGet(this, _dataChannel).onClosed(() => {
/home/murat/js/node-datachannel/dist/esm/polyfill/RTCPeerConnection.mjs:92 - const dc = new RTCDataChannel(channel);

This solution is working.
Another solution could be that, we can guard this part of the code .

@murat-dogan
Copy link
Contributor Author

As I see same thing could happen to Track also.

@murat-dogan
Copy link
Contributor Author

Oops...

Very rare but still there is a race condition.

  • While Thread1 in if (!mIsClosed.exchange(true)) {
  • Another thread calls the destructor.
  • Destructor calls close and then resetCallbacks before Thread1 triggers the close function.

The solution seems to guard the close function totally.
I tried it and seems stable.

If you also confirm that, I can create a PR for it.

@paullouisageneau
Copy link
Owner

Very rare but still there is a race condition.

  • While Thread1 in if (!mIsClosed.exchange(true)) {

  • Another thread calls the destructor.

  • Destructor calls close and then resetCallbacks before Thread1 triggers the close function.

This scenario is not possible because the destructor can only be called if all shared pointers have been deleted, whereas the thread calling close must hold a shared pointer.

Copy link
Owner

@paullouisageneau paullouisageneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks fine to me, thank you for the investigation!

@paullouisageneau paullouisageneau merged commit 0e40aeb into paullouisageneau:master May 6, 2025
12 checks passed
@murat-dogan
Copy link
Contributor Author

As I said it is scarce but I see it still.
Probably while one thread is within the function, another thread deletes the object.
Please see this log;

2025-05-07 20:33:17.723 ERROR [1676086] [rtc::impl::DataChannel::close@100] Closing DataChannel
2025-05-07 20:33:17.723 ERROR [1676086] [rtc::impl::DataChannel::close@108] mIsClosed: 0 closedCallback: 1
2025-05-07 20:33:17.723 ERROR [1676086] [DataChannelWrapper::onClosed@364] onClosed cb received from rtc: c-0
2025-05-07 20:33:17.723 ERROR [1676086] [rtc::impl::DataChannel::close@114] calling resetCallbacks
2025-05-07 20:33:17.723 ERROR [1676086] [rtc::impl::DataChannel::close@117] close() end mIsClosed: 1 closedCallback: 0
2025-05-07 20:33:17.723 ERROR [1676086] [rtc::Channel::~Channel@18] Channel::~Channel
2025-05-07 20:33:17.723 ERROR [1676086] [rtc::impl::DataChannel::close@100] Closing DataChannel
2025-05-07 20:33:17.723 ERROR [1676086] [rtc::impl::DataChannel::close@108] mIsClosed: 1 closedCallback: 0
2025-05-07 20:33:17.723 ERROR [1676086] [rtc::impl::DataChannel::close@117] close() end mIsClosed: 1 closedCallback: 0
// Thread1
2025-05-07 20:33:17.724 ERROR [1676104] [rtc::impl::DataChannel::close@100] Closing DataChannel
2025-05-07 20:33:17.724 ERROR [1676104] [rtc::impl::DataChannel::close@108] mIsClosed: 0 closedCallback: 1
// Thread2 coming, Thread1 still in function
2025-05-07 20:33:17.724 ERROR [1676086] [rtc::impl::DataChannel::close@100] Closing DataChannel
2025-05-07 20:33:17.724 ERROR [1676086] [rtc::impl::DataChannel::close@108] mIsClosed: 1 closedCallback: 1
// Thread2 function end
2025-05-07 20:33:17.724 ERROR [1676086] [rtc::impl::DataChannel::close@117] close() end mIsClosed: 1 closedCallback: 1
// Thread2 Destructor
2025-05-07 20:33:17.724 ERROR [1676086] [rtc::Channel::~Channel@18] Channel::~Channel

2025-05-07 20:33:17.724 ERROR [1676104] [rtc::impl::DataChannel::close@114] calling resetCallbacks
// Thread1 function end
2025-05-07 20:33:17.724 ERROR [1676104] [rtc::impl::DataChannel::close@117] close() end mIsClosed: 1 closedCallback: 0
2025-05-07 20:33:17.724 ERROR [1676104] [rtc::impl::DataChannel::close@100] Closing DataChannel
2025-05-07 20:33:17.724 ERROR [1676104] [rtc::impl::DataChannel::close@108] mIsClosed: 1 closedCallback: 0
2025-05-07 20:33:17.724 ERROR [1676104] [rtc::impl::DataChannel::close@117] close() end mIsClosed: 1 closedCallback: 0

With these changes;

void DataChannel::close() {
	PLOG_ERROR << "Closing DataChannel";

	shared_ptr<SctpTransport> transport;
	{
		std::shared_lock lock(mMutex);
		transport = mSctpTransport.lock();
	}

	PLOG_ERROR << "mIsClosed: " << mIsClosed << " closedCallback: " << (bool)closedCallback;
	if (!mIsClosed.exchange(true)) {
		if (transport && mStream.has_value())
			transport->closeStream(mStream.value());

		triggerClosed();
		PLOG_ERROR << "calling resetCallbacks";
		resetCallbacks(true);		
	}	
	PLOG_ERROR << "close() end " << "mIsClosed: " << mIsClosed << " closedCallback: " << (bool)closedCallback;
}
Channel::~Channel() {	
	impl()->resetCallbacks();
	PLOG_ERROR << "Channel::~Channel";
}

If I guard the close function, the problem is solved.

@paullouisageneau
Copy link
Owner

paullouisageneau commented May 8, 2025

A thread calling the destructor while another thread is in a method should never happen. Are you sure the two calls are for the same datachannel?

If it happens, this is a memory issue related to smart pointers. Guarding the destructor would not be a proper workaround because it means a thread might destroy the mutex while another thread waits on it.

@murat-dogan
Copy link
Contributor Author

murat-dogan commented May 8, 2025

I was running the test program only for 1 data channel.
Of course the logs from both peer, we can see.

As I meant/wrote, guard for data channel close function not destructor.

@paullouisageneau
Copy link
Owner

I was running the test program only for 1 data channel. Of course the logs from both peer, we can see.

So are you sure the log for one peer is not interleaved with the log for the other?

As I meant/wrote, guard for data channel close function not destructor.

Sorry my bad, I meant guarding the close function. It wouldn't mitigate the core issue that one thread destroys an object while the other is in a method.

@murat-dogan
Copy link
Contributor Author

So are you sure the log for one peer is not interleaved with the with the log for the other?

I will add peer ids to log, when I have time

Sorry my bad, I meant guarding the close function. It wouldn't mitigate the core issue that one thread destroys an object while the other is in a method.

That’s true, will not solve root problem

@murat-dogan
Copy link
Contributor Author

If the problem is really related to a shared_ptr bug maybe we have to find guard any other workaround.

let’s run the test and have logs with peer ids
I will update here

@murat-dogan
Copy link
Contributor Author

Logs with peer IDs.
Same Problem. The problem occurred for peer:1

2025-05-08 17:35:54.271 ERROR [1776113] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-08 17:35:54.271 ERROR [1776113] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 0 closedCallback: 1
2025-05-08 17:35:54.271 ERROR [1776113] [DataChannelWrapper::onClosed@364] onClosed cb received from rtc: c-0
2025-05-08 17:35:54.271 ERROR [1776110] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-08 17:35:54.271 ERROR [1776113] [rtc::impl::DataChannel::close@115] peer:0 calling resetCallbacks
2025-05-08 17:35:54.271 ERROR [1776110] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 0 closedCallback: 1
2025-05-08 17:35:54.271 ERROR [1776113] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::Channel::~Channel@18] Channel::~Channel
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 1 closedCallback: 1
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 1
// Thread 1776099 calls destructor and Thread 1776110 is still in function
2025-05-08 17:35:54.272 ERROR [1776099] [rtc::Channel::~Channel@18] Channel::~Channel
2025-05-08 17:35:54.273 ERROR [1776110] [rtc::impl::DataChannel::close@115] peer:1 calling resetCallbacks
2025-05-08 17:35:54.273 ERROR [1776110] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.273 ERROR [1776110] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-08 17:35:54.273 ERROR [1776110] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.273 ERROR [1776110] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.273 ERROR [1776113] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-08 17:35:54.273 ERROR [1776113] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.273 ERROR [1776113] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0

@murat-dogan
Copy link
Contributor Author

And logs for a normal run, no error

2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 0 closedCallback: 1
2025-05-08 17:35:54.094 ERROR [1776091] [DataChannelWrapper::onClosed@364] onClosed cb received from rtc: c-0
2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@115] peer:0 calling resetCallbacks
2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 0 closedCallback: 1
2025-05-08 17:35:54.094 ERROR [1776091] [DataChannelWrapper::onClosed@364] onClosed cb received from rtc: c-0
2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@115] peer:1 calling resetCallbacks
2025-05-08 17:35:54.094 ERROR [1776091] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.094 ERROR [1776076] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-08 17:35:54.094 ERROR [1776076] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.094 ERROR [1776076] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.094 ERROR [1776076] [rtc::Channel::~Channel@18] Channel::~Channel
2025-05-08 17:35:54.094 ERROR [1776076] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-08 17:35:54.094 ERROR [1776076] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.094 ERROR [1776076] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.095 ERROR [1776076] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-08 17:35:54.095 ERROR [1776076] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.095 ERROR [1776076] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.095 ERROR [1776076] [rtc::Channel::~Channel@18] Channel::~Channel
2025-05-08 17:35:54.095 ERROR [1776076] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-08 17:35:54.095 ERROR [1776076] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 1 closedCallback: 0
2025-05-08 17:35:54.095 ERROR [1776076] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 0

@murat-dogan
Copy link
Contributor Author

Please check this PR
murat-dogan/node-datachannel#353

Since you are also closing the channels on peer.close I have removed this part from my code.
And for this scenario, I need to reset callbacks etc.

With this PR also no problem.
Is it a fix or workaround, God knows for now.

@murat-dogan
Copy link
Contributor Author

And now log is more clean (for 1 datachannel)

2025-05-09 13:12:35.477 ERROR [2063565] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-09 13:12:35.477 ERROR [2063565] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 0 closedCallback: 1
2025-05-09 13:12:35.477 ERROR [2063565] [DataChannelWrapper::onClosed@371] onClosed cb received from rtc: c-0
2025-05-09 13:12:35.478 ERROR [2063565] [rtc::impl::DataChannel::close@115] peer:0 calling resetCallbacks
2025-05-09 13:12:35.478 ERROR [2063565] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0
2025-05-09 13:12:35.478 ERROR [2063561] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-09 13:12:35.478 ERROR [2063561] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 0 closedCallback: 1
2025-05-09 13:12:35.478 ERROR [2063561] [DataChannelWrapper::onClosed@371] onClosed cb received from rtc: c-0
2025-05-09 13:12:35.478 ERROR [2063561] [rtc::impl::DataChannel::close@115] peer:1 calling resetCallbacks
2025-05-09 13:12:35.478 ERROR [2063561] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 0
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::Channel::~Channel@18] Channel::~Channel
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::impl::DataChannel::close@100] peer:1 Closing DataChannel
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::impl::DataChannel::close@108] peer:1 mIsClosed: 1 closedCallback: 0
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::impl::DataChannel::close@118] peer:1 close() end mIsClosed: 1 closedCallback: 0
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::Channel::~Channel@18] Channel::~Channel
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::impl::DataChannel::close@100] peer:0 Closing DataChannel
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::impl::DataChannel::close@108] peer:0 mIsClosed: 1 closedCallback: 0
2025-05-09 13:12:35.479 ERROR [2063547] [rtc::impl::DataChannel::close@118] peer:0 close() end mIsClosed: 1 closedCallback: 0

@paullouisageneau
Copy link
Owner

OK, it makes sense, even if I'm not sure I understand what was the underlying issue.

@murat-dogan
Copy link
Contributor Author

Could you please create a tag or release for this?

@mertushka
Copy link

@paullouisageneau Could you please create a release for this fix?

@paullouisageneau
Copy link
Owner

@murat-dogan @mertushka I've released v0.23.0 with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants