Skip to content

fix: provide remote servers a way to find out about an event created during the remote join handshake#19390

Open
FrenchGithubUser wants to merge 2 commits intoelement-hq:developfrom
famedly:join-race-condition
Open

fix: provide remote servers a way to find out about an event created during the remote join handshake#19390
FrenchGithubUser wants to merge 2 commits intoelement-hq:developfrom
famedly:join-race-condition

Conversation

@FrenchGithubUser
Copy link
Copy Markdown

@FrenchGithubUser FrenchGithubUser commented Jan 19, 2026

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

TLDR

Use a "dummy" event to tie together forward extremities, and proactively send it to all servers in the room. This allows recently joined servers to become aware of recent events that would otherwise have "slipped through the cracks" and thus not be retrievable.

NOTE: While this does send the "dummy" event to all servers in the room, regardless of if they should care or not, at some point a new event will reference this dummy event and require it's retrieval. Since it was proactively sent, this will now not be necessary. This assists in preventing forks in the DAG

Alternatives

Unlike famedly/synapse#51 which 'pushes' the missing event directly, this causes the event to be 'pulled' by referencing it as a prev_event of a dummy event. Since the 'dummy event' does not get passed into the client, it is effectively invisible.

Draw-backs of famedly/synapse#51 meant it was not always certain if the 'pushed event' would show up in /sync or in /messages, but usually was in /sync. This method always has the 'missing event' show up in /messages, which I feel is more technically correct as that event was(albeit just barely) created before the 'join event' is persisted.

The Process

The order of events:

  1. make_join from remote server, response sent
  2. Message A sent from local server
  3. send_join from remote server, response from local server. Message A is not in this(as it is not state and is not referenced in any events that are included). Join event is persisted on local server.
  4. Local server realizes there are two forward extremities just after persisting the join event.
    A. Creates a org.matrix.dummy_event that has prev_events containing both the join and message A.
    B. Sends this dummy event to all servers in the room.
  5. Remote server receives the dummy event via it's /send endpoint, saves it in a queue until the partial state join begins syncing additional room state

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jan 19, 2026

CLA assistant check
All committers have signed the CLA.

@FrenchGithubUser FrenchGithubUser marked this pull request as ready for review January 19, 2026 11:36
@FrenchGithubUser FrenchGithubUser requested a review from a team as a code owner January 19, 2026 11:36
@FrenchGithubUser
Copy link
Copy Markdown
Author

I am submitting this PR as an employee of Famedly, who has signed the corporate CLA, and used my company email in the commit.

Copy link
Copy Markdown
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @FrenchGithubUser. As far as the Element Backend team is aware, Famedly has not yet signed the CCLA. However this is apparently currently in progress.

Holding off on review until that's resolved. Regardless, thank you for submitting your work upstream!

@ara4n
Copy link
Copy Markdown
Member

ara4n commented Feb 12, 2026

famedly has now signed the ccla :)

@MadLittleMods MadLittleMods requested a review from a team February 12, 2026 00:48
@sandhose
Copy link
Copy Markdown
Member

@FrenchGithubUser I think to allow the CLA bot to let you through, your membership to the famedly organisation must be public

If you don't want that to be the case, I can add you specifically to the list of allowed users, but making the org membership public is easier for us :)

@FrenchGithubUser
Copy link
Copy Markdown
Author

@sandhose I just updated the membership, should be public now! I didn't know this visibility could be changed :)

@anoadragon453
Copy link
Copy Markdown
Member

@FrenchGithubUser could you update the branch (just pull from develop)? It seems the CI is not running, and I don't have permission to push to your branch to do it for you.

@FrenchGithubUser
Copy link
Copy Markdown
Author

@anoadragon453 done

Copy link
Copy Markdown
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting solution - thanks for sending it upstream.

I'd really like to see a Complement test for this if possible, so we can verify that this fixes the problem. I think you'd just need to send an event into the room between the /make_join and /send_join requests.

Existing federated room join tests: https://github.com/matrix-org/complement/blob/main/tests/federation_room_join_test.go

event, context = await self._on_send_membership_event(
origin, content, Membership.JOIN, room_id
)
# Collect this now, the internal metadata of event(which should have it) doesn't
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is quite difficult to read.

And why not query this close to where it's used, below?

Comment on lines +2236 to +2248
if not dummy_event_sent:
# Did not find a valid user in the room, so remove from future attempts
# Exclusion is time limited, so the room will be rechecked in the future
# dependent on _DUMMY_EVENT_ROOM_EXCLUSION_EXPIRY
logger.info(
"Failed to send dummy event into room %s. Will exclude it from "
"future attempts until cache expires",
room_id,
)
# This mapping is room_id -> time of last attempt(in ms)
self._rooms_to_exclude_from_dummy_event_insertion[room_id] = (
self.clock.time_msec()
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth waiting an trying again if the room will move on in the meantime, and new events will likely be sent?

@FrenchGithubUser
Copy link
Copy Markdown
Author

@anoadragon453 where would you like the complement test to be, in the complement repo or the synapse repo?

@anoadragon453
Copy link
Copy Markdown
Member

@FrenchGithubUser as I believe this would be applicable to any homeserver implementation, the Complement repo itself would be best. Thanks!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent events created during the /make_join/send_join handshake from being “missed” by the joining remote server by injecting a dummy event which references the current forward extremities and is proactively federated.

Changes:

  • Add a new post-remote-join path to create/send a dummy event with internal_metadata.proactively_send=True.
  • On /send_join, detect multiple forward extremities and trigger dummy event injection.
  • Add a bugfix changelog entry describing the behavior change.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
synapse/handlers/message.py Adds _send_dummy_event_after_room_join and makes proactive-send configurable on dummy events.
synapse/federation/federation_server.py After handling /send_join, checks forward extremities and triggers dummy event injection.
changelog.d/19390.bugfix Documents the bugfix for handshake-created events being missed by newly joined servers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +817 to +825
forward_extremities = await self.store._get_forward_extremeties_for_room(
room_id, stream_ordering_of_join.get_max_stream_pos()
)

if len(forward_extremities) > 1:
# The likelihood of this being used is extremely low, thus only build the handler
# when necessary.
_creation_handler = self.hs.get_event_creation_handler()
await _creation_handler._send_dummy_event_after_room_join(room_id)
Comment on lines +821 to +826
if len(forward_extremities) > 1:
# The likelihood of this being used is extremely low, thus only build the handler
# when necessary.
_creation_handler = self.hs.get_event_creation_handler()
await _creation_handler._send_dummy_event_after_room_join(room_id)

event, context = await self._on_send_membership_event(
origin, content, Membership.JOIN, room_id
)
# Collect this now, the internal metadata of event(which should have it) doesn't
Comment on lines +2227 to +2230
This should only be triggered when handling a remote join while there was
events sent during the make_join/send_join handshake. The joining
homeserver would otherwise not immediately know to backfill this event,
and would "miss it".
Comment on lines +805 to +826
# Check the forward extremities for the room here. If there is more than one, it
# is likely that another event was created in the room during the
# make_join/send_join handshake. The joining server is likely to thus miss this event
# until a second event is created when references it - which could be some time.
# In that case, we proactively send a dummy extensible event that ties these
# forward extremities together. The remote server will then attempt to backfill
# the missing event on its own.
#
# By not sending the 'missing event' directly, but instead having the joining
# homeserver backfill it, the stream ordering for the missing event will be
# "before" the join (which is what we expect).

forward_extremities = await self.store._get_forward_extremeties_for_room(
room_id, stream_ordering_of_join.get_max_stream_pos()
)

if len(forward_extremities) > 1:
# The likelihood of this being used is extremely low, thus only build the handler
# when necessary.
_creation_handler = self.hs.get_event_creation_handler()
await _creation_handler._send_dummy_event_after_room_join(room_id)

@@ -0,0 +1 @@
Provide remote servers a way to find out about an event created during the remote join handshake. Contributed by @FrenchGithubUser and @jason-famedly @ Famedly.
Comment on lines +761 to +763
# Collect this now, the internal metadata of event(which should have it) doesn't
stream_ordering_of_join = (
await self.store.get_current_room_stream_token_for_room_id(room_id)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants