Skip to content

MSC4243: User ID localparts as Account Keys#4243

Open
kegsay wants to merge 34 commits intomainfrom
kegan/placeholder-2
Open

MSC4243: User ID localparts as Account Keys#4243
kegsay wants to merge 34 commits intomainfrom
kegan/placeholder-2

Conversation

@kegsay
Copy link
Copy Markdown
Member

@kegsay kegsay commented Dec 17, 2024

@kegsay kegsay changed the title [WIP] Add stub placeholder for MSC number [WIP] Placeholder stub Dec 17, 2024
@kegsay kegsay marked this pull request as draft December 17, 2024 15:49
@turt2live turt2live added proposal A matrix spec change proposal needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Dec 17, 2024
@turt2live

This comment was marked as resolved.

@turt2live turt2live added proposal-placeholder This label is removed and replaced with `proposal` once the placeholder status is cleared. action-required Just a bright label to differentiate arbitrary proposals. and removed proposal A matrix spec change proposal labels Jul 3, 2025
@github-project-automation github-project-automation bot moved this to Tracking for review in Spec Core Team Workflow Jul 8, 2025
@kegsay kegsay changed the title [WIP] Placeholder stub MSC4243: User ID localparts as Account Keys Sep 3, 2025
@kegsay kegsay marked this pull request as ready for review September 3, 2025 15:18
@tulir tulir added requires-room-version An idea which will require a bump in room version proposal A matrix spec change proposal room-spec Something to do with the room version specifications unassigned-room-version Remove this label when things get versioned. kind:core MSC which is critical to the protocol's success and removed proposal-placeholder This label is removed and replaced with `proposal` once the placeholder status is cleared. action-required Just a bright label to differentiate arbitrary proposals. labels Sep 3, 2025
Copy link
Copy Markdown
Member

@tulir tulir Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Server (preferably multiple)
  • Client with account key awareness (preferably multiple)
  • Complement tests

Copy link
Copy Markdown
Contributor

@Gnuxie Gnuxie Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These implementation requirements are a bit light. Given that there is no security disclosure happening related to this MSC, could we please be a little more considerate with rolling this out vs V12?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, why are implementation requirements this weak for a mainline room version?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do prefer multiple implementations for changes like this - I've clarified the comment.

Discussed over chat: this MSC does not define a (stable) room version like v12 - it describes a component of a possible future room version. The testing is more important for the future version's MSC when all the bundled changes are included in a "real" room version.

Comment on lines +7 to +8
- As user IDs are user controlled, spammers set their localpart to abusive messages in order to harass and intimidate others. Redactions
do not remove the user ID so these messages persist in the room.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked comment is:

This does not work for single user instances, since their domain is still part of all mxids in this proposal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved with the change to using the key as the principal. For clarity, the unsigned section now looks like:

{
    // .. event fields
    "unsigned": {
        "sender_account": {
            "key": "l8Hft5qXKn1vfHrg3p4-W8gELQVo8N13JkluMfmn2sQ",
            "user_id": "@kegan:matrix.org",
        }
    }
}

Where user_id is optional, and set only for verified domains/users. Then:

Clients can render the unsigned.sender_account.user_id field (if it exists) as a human-readable displayable identifier for the user. When the user is deleted, the key can be rendered instead.

This is specifically referring to GDPR erasure but the same can apply for redacting the membership event.

may want to send the account name user ID as the user ID may be displayed on the notification, but some clients may want to map
the name to an account key user ID.

### Security Considerations
Copy link
Copy Markdown
Contributor

@Gnuxie Gnuxie Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we stop a server from generating an unbounded number account keys for server names for which ownership cannot be verified (ie entirely made up). Then leak them to other servers to dos the room with membership for those account keys or other events? ie https://github.com/matrix-org/matrix-spec-proposals/pull/4345/files#diff-20e93dc19ad1924029f07297e00cdc8cd11288f8ad65c77b6a71df8f05b3c1d0R48-R73

The reason an attacker would do this is to evade detection, with the spam generating servers being unaffected by m.room.server_acl provided the leaker can avoid DAG forensics or coordinated traffic analysis (both of which would take time AND for targetted servers to synchronise with the room admin's servers). Unlike the current room model, the spam generating servers under this MSC would not actually need any distinct infrastructure, deployment, domain names. So the attack would be cheaper to conduct.

It can also be used in private rooms where the invite permission can fan out to new members.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSC4345 handles this by requiring room members to publish a server key AND requiring a room admin to acknowledge each new key. And the power level to do that is distinct to invite.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand: your concern is that this MSC makes the barrier to entry for spammers to spam at the server-level too low, because you don't need any DNS registration. In comparison, MSC4345 has this admin handshake where new servers need to get sign-off from an existing admin prior to being able to join. Is that right?

Assuming it is right, then it's a tradeoff between safety and availability. My primary concern with an admin handshake is that it globally decreases availability in the protocol because if you cannot talk to an admin server then you cannot join the room. This may feel like some distant edge case but in practice it isn't. Last year matrix.org was down for an extended time and we were still able to communicate in existing rooms with only matrix.org admins because we could invite our backup accounts into the room. Had we been operating with MSC4345, we would not be able to do this and would have to create new rooms for the duration of the outage. In addition, whilst I can see the argument for an admin handshake on the public network, Matrix is used in a lot more than just the public federation. Many private, closed federations exist where a requirement to talk to the admin may not be appropriate or practical. These reasons are why I tend to push for more availability in the protocol, with the ability to centralise via policy servers (who can apply much more sophisticated rules) should the network need it.

That being said, I do think there is a place for traceable memberships in the protocol, but done in a less intrusive way. Specifically, I think we should force all public rooms to require invites, but allow the server to issue the invite (and thus not require any client interaction to join public rooms). This means the join-helper server in the make/send-join dance would be identifiable in the DAG, which can be critical to identify colluding servers.

It's worth noting that the attack here wouldn't affect clients as events from unverified domains are not sent to clients, so it would be purely a DoS attack, which can be mitigated via existing rate limiting tooling.

Copy link
Copy Markdown
Contributor

@Gnuxie Gnuxie Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand: your concern is that this MSC makes the barrier to entry for spammers to spam at the server-level too low, because you don't need any DNS registration.

Yep

Last year matrix.org was down for an extended time and we were still able to communicate in existing rooms with only matrix.org admins because we could invite our backup accounts into the room. Had we been operating with MSC4345, we would not be able to do this and would have to create new rooms for the duration of the outage.

So you are saying you had no redundant admin accounts and have lost control of all powered user accounts? In this scenario the integrity of your room is seriously compromised. Being able to join or even interact with the room as a joined user is a serious security issue.

My primary concern with an admin handshake is that it globally decreases availability in the protocol because if you cannot talk to an admin server then you cannot join the room.

This is desirable, if no one can moderate new joins, then joining the room should be unavailable.

Many private, closed federations exist where a requirement to talk to the admin may not be appropriate or practical.

Then the power level or any mechanism which allows a server to handle joins can be more distributed in these environments, much like the invite power level. Are we confident closed federations are not also be vulnerable to the same attack vector?

These reasons are why I tend to push for more availability in the protocol, with the ability to centralise via policy servers (who can apply much more sophisticated rules) should the network need it.

This would make any room without a policy server insecure, and private rooms will also have this problem.

That being said, I do think there is a place for traceable memberships in the protocol, but done in a less intrusive way. Specifically, I think we should force all public rooms to require invites, but allow the server to issue the invite (and thus not require any client interaction to join public rooms). This means the join-helper server in the make/send-join dance would be identifiable in the DAG, which can be critical to identify colluding servers.

Would it not be simpler and equivalent to require a policy server to sign each join?

It's worth noting that the attack here wouldn't affect clients as events from unverified domains are not sent to clients, so it would be purely a DoS attack, which can be mitigated via existing rate limiting tooling.

While the events aren't sent to clients, the memberships will surely add to the state complexity of the room? And so rate limiting will not be sufficient, and the attack couldn't be detected by room admins?

Copy link
Copy Markdown
Member Author

@kegsay kegsay Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no one can moderate new joins, then joining the room should be unavailable.

And that's the key difference in our views. It's ultimately going to be up to the SCT to decide if this risk is worth the gain in availability. It's really about the default behaviour because both options can add this moderation gate / be more available, but it's whether it does by default. It's a bit late when your admin server is knocked offline to change the PLs. Similarly, it's a bit late to add a policy server when your room is full of CSAM. Taken to the extreme, the loss in availability damages the ability to do P2P Matrix as the chances of admin users being online are much lower than admin servers. On the other extreme, the focus on policy servers everywhere would indeed mean "any room without a policy server insecure", damaging the decentralisation efforts of the public federation.

Would it not be simpler and equivalent to require a policy server to sign each join?

No, because any server can act as a join-helper currently, whereas a policy server is centralised.

While the events aren't sent to clients, the memberships will surely add to the state complexity of the room? And so rate limiting will not be sufficient, and the attack couldn't be detected by room admins?

It adds more state events to the DAG, which is what "state complexity" tries to measure. Unfortunately this is just the way append-only data structures work. This problem exists irrespective of either MSC, meaning servers need to protect themselves (e.g via rate limits) regardless. This isn't the room admin's responsibility as it only consumes server resources. I have some ideas on compacting BFT CRDTs, but it's very much at the academic research level, there aren't any good off-the-shelf options that we can use.

Copy link
Copy Markdown
Contributor

@Gnuxie Gnuxie Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds more state events to the DAG, which is what "state complexity" tries to measure. Unfortunately this is just the way append-only data structures work. This problem exists irrespective of either MSC, meaning servers need to protect themselves (e.g via rate limits) regardless. This isn't the room admin's responsibility as it only consumes server resources. I have some ideas on compacting BFT CRDTs, but it's very much at the academic research level, there aren't any good off-the-shelf options that we can use.

A BFT CRDT only helps in the situation that a leaky server is used to evade detection. But it seems that given these events from invalid domains would not be shown to clients a byzantine node may not be required. All that is happening is that nodes are generating a lot of noise. As for compaction? Maybe?

In MSC4345 these events are traceable and can be seen by the room admin. The leaking servers can be found added to m.room.server_acl. It was designed with this attack in mind. So the idea that we can't do anything about this now without getting to the cutting edge of research isn't actually true.

Copy link
Copy Markdown
Contributor

@Gnuxie Gnuxie Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is just the way append-only data structures work. This problem exists irrespective of either MSC

Specifically, we already have a solution available outside of the current room model in m.room.server_acl and this proposal (MSC4243) does weaken the situations where m.room.server_acl is effective.
To be precise m.room.server_acl is an example of thinking outside the box. ie Byzantines cannot deliver their messages and so their events cannot be considered, and the rest of the room converges on what the history of byzantine nodes is before the point they were cut off1. So it isn't fair or appropriate to make this appeal to the nature of append-only data structures as a reason to continue forward regardless.

Separate to that point being unfair, we can always do more and solve this in a better way. MSC4345 opens the door for a lot of exploration here. For example we can change the bar from "the rest of the room converges on what the history of an excluded server is" to canonicalising that history through the revoke participation auth event (which is coordinated with power level).

Footnotes

  1. Though you might consider m.room.server_acl to be a product of the "ingenuity of fools" ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ultimately going to be up to the SCT to decide if this risk is worth the gain in availability.

I haven't run this past the SCT, but my quick suggestion here is:

  • The protocol shouldn't force joins to go via an admin server in general, as otherwise it undermines availability during a partition for environments which care about that (i.e. any non-public-chatroom use case).
  • However, if you're in a public chatroom, you probably do want your joins to go via a central point of control to avoid abuse - whether that's a joingate or policyserver or admin server. So you can and should add them in to protect your public rooms.
  • This chokepoint should be layered on top for public rooms rather than something baked into the protocol for all rooms however.

Copy link
Copy Markdown
Member

@ara4n ara4n Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative could be that public-joins always do need to be signed off by an admin server (or join gate, or similar) to prevent abuse, but invites/knocks don't?

Copy link
Copy Markdown
Contributor

@Gnuxie Gnuxie Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See also #4345 (review))

@tulir already has join-gates deployed in all his public rooms via restricted join and a synapse patch. We could probably just change the join rule for public rooms to be more like restricted join and mandate that the homeserver implement some basic checks here like checking that the server name is valid on new keys before signing any join. And then we can later pass-through the capability to policy server or moderation bots so that they can do arbitrary checks on joining users (or this can continue to happen in an implementation dependent way).

https://github.com/maunium/meowlnir/blob/285d810e56e0586a9d0bcb61a3b3bc989fb409f4/policyeval/antispam.go#L141-L169
https://mau.dev/maunium/synapse/-/blob/master/synapse/module_api/callbacks/spamchecker_callbacks.py?ref_type=heads#L491-522

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal requires-room-version An idea which will require a bump in room version room-spec Something to do with the room version specifications unassigned-room-version Remove this label when things get versioned.

Projects

Status: Tracking for review

Development

Successfully merging this pull request may close these issues.

9 participants