authorithy-discovery: Make changing of peer-id while active a bit more robust#3786
Merged
authorithy-discovery: Make changing of peer-id while active a bit more robust#3786
Conversation
…e robust In the case when nodes don't persist their node-key or they want to generate a new one while being in the active set, things go wrong because both the old addresses and the new ones will still be present in DHT, so because of the distributed nature of the DHT both will survive in the network untill the old ones expires which is 36 hours. Nodes in the network will randomly resolve the authorithy-id to the old address or the new one. More details in: #3673 This PR proposes we mitigate this problem, by: 1. Let the query for a DHT key retrieve all the results, that is usually bounded by the replication factor which is 20, currently we interrupt the querry on the first result. 2. Modify the authority-discovery service to keep all the discovered addresses around for 24h since they last seen an address. 3. Plumb through other subsystems where the assumption was that an authorithy-id will resolve only to one PeerId. Currently, the authorithy-discovery keeps just the last record it received from DHT and queries the DHT every 10 minutes. But they could always receive only the old address, only the new address or a flip-flop between them depending on what node wins the race to provide the record 4. Update gossip-support to try resolve authorities more often than every session. This would gives us a lot more chances for the nodes in the networks to also discover not only the old address of the node but also the new one and should improve the time it takes for a node to be properly connected in the network. The behaviour won't be deterministic because there is no guarantee the all nodes will see the new record at least once, since they could query only nodes that have the old one. Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
lexnv
reviewed
Mar 22, 2024
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
lexnv
reviewed
Mar 25, 2024
lexnv
reviewed
Mar 25, 2024
2 tasks
…_node_id_at_restart
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
…_node_id_at_restart
lexnv
reviewed
Jul 18, 2024
lexnv
reviewed
Jul 18, 2024
lexnv
reviewed
Jul 18, 2024
lexnv
reviewed
Jul 18, 2024
lexnv
reviewed
Jul 18, 2024
lexnv
reviewed
Jul 18, 2024
lexnv
approved these changes
Jul 18, 2024
Contributor
lexnv
left a comment
There was a problem hiding this comment.
LGTM! Maybe could bound the total number of peers in the record info using an LRU with a decent amount. In theory to not update every outdated peer (whole discoverable network) from this node, but only a limited number of peers?
Co-authored-by: Alexandru Vasile <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
…_node_id_at_restart_v3
Signed-off-by: Alexandru Gheorghe <[email protected]>
lexnv
reviewed
Jul 23, 2024
RFC submitted and passed polkadot-fellows/RFCs#91
Contributor
Author
|
Validate this PR in versi for the past 2 days with both litep2p and libp2p backends, things seems to work as expected, merging it now. |
TarekkMA
pushed a commit
to moonbeam-foundation/polkadot-sdk
that referenced
this pull request
Aug 2, 2024
This PR updates the litep2p crate to the latest version. This fixes the build for developers that want to perform `cargo update` on all their dependencies: paritytech#4343, by porting the latest changes. The peer records were introduced to litep2p to be able to distinguish and update peers with outdated records. It is going to be properly used in substrate via: paritytech#3786, however that is pending the commit to merge on litep2p master: paritytech/litep2p#96. Closes: paritytech#4343 --------- Signed-off-by: Alexandru Vasile <[email protected]>
TarekkMA
pushed a commit
to moonbeam-foundation/polkadot-sdk
that referenced
this pull request
Aug 2, 2024
…e robust (paritytech#3786) In the case when nodes don't persist their node-key or they want to generate a new one while being in the active set, things go wrong because both the old addresses and the new ones will still be present in DHT, so because of the distributed nature of the DHT both will survive in the network untill the old ones expires which is 36 hours. Nodes in the network will randomly resolve the authorithy-id to the old address or the new one. More details in: paritytech#3673 This PR proposes we mitigate this problem, by: 1. Let the query for a DHT key retrieve more than one results(4), that is also bounded by the replication factor which is 20, currently we interrupt the querry on the first result. ~2. Modify the authority-discovery service to keep all the discovered addresses around for 24h since they last seen an address.~ ~3. Plumb through other subsystems where the assumption was that an authorithy-id will resolve only to one PeerId. Currently, the authorithy-discovery keeps just the last record it received from DHT and queries the DHT every 10 minutes. But they could always receive only the old address, only the new address or a flip-flop between them depending on what node wins the race to provide the record~ 2. Extend the `SignedAuthorityRecord` with a signed creation_time. 3. Modify authority discovery to keep track of nodes that sent us old record and once we are made aware of a new record update the nodes we know about with the new record. 4. Update gossip-support to try resolve authorities more often than every session. ~This would gives us a lot more chances for the nodes in the networks to also discover not only the old address of the node but also the new one and should improve the time it takes for a node to be properly connected in the network. The behaviour won't be deterministic because there is no guarantee the all nodes will see the new record at least once, since they could query only nodes that have the old one.~ ## TODO - [x] Add unittests for the new paths. - [x] Make sure the implementation is backwards compatible - [x] Evaluate if there are any bad consequence of letting the query continue rather than finish it at first record found. - [x] Bake in versi the new changes. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Dmitry Markin <[email protected]> Co-authored-by: Alexandru Vasile <[email protected]>
Intel-driver
added a commit
to Intel-driver/litep2p
that referenced
this pull request
Dec 24, 2025
This PR implements the `put_record_to` and `try_put_record_to` to selectively pick peers to update their records. The main use-case from substrate would be the following: - A peer is discovered to have an outdated authority record (needs paritytech/litep2p#76) - Update the record with the latest authority record available (part of this PR) This PR provided peers to the engine if the peers are part of the kBucket. The first step of the discovery in substrate motivates this assumption. We can probably do things a bit more optimally since we know the peers part of the kBucket were discovered previously (or currently connected): - The query starts with a [FindNodeContext](https://github.com/paritytech/litep2p/blob/96e827b54f9f937c6d0489bef6a438b48cf50e58/src/protocol/libp2p/kademlia/query/find_node.rs#L37), which in this case will do a peer discovery as well - We could implement a `PutNodeContext` which circumvents the need to discover the peers and just forwards a kad `PUT_VALUE` to those peers We'd have to double check that with libp2p as well (my brief looking over code points to this direction). To unblock paritytech/polkadot-sdk#3786 we can merge this and then come back with a better / optimal solution for this Builds on top of: paritytech/litep2p#76 cc @paritytech/networking --------- --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Dmitry Markin <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the case when nodes don't persist their node-key or they want to generate a new one while being in the active set, things go wrong because both the old addresses and the new ones will still be present in DHT, so because of the distributed nature of the DHT both will survive in the network untill the old ones expires which is 36 hours. Nodes in the network will randomly resolve the authorithy-id to the old address or the new one.
More details in: #3673
This PR proposes we mitigate this problem, by:
Let the query for a DHT key retrieve more than one results(4), that is also bounded by the replication factor which is 20, currently we interrupt the querry on the first result.
2. Modify the authority-discovery service to keep all the discovered addresses around for 24h since they last seen an address.3. Plumb through other subsystems where the assumption was that an authorithy-id will resolve only to one PeerId. Currently, the authorithy-discovery keeps just the last record it received from DHT and queries the DHT every 10 minutes. But they could always receive only the old address, only the new address or a flip-flop between them depending on what node wins the race to provide the recordExtend the
SignedAuthorityRecordwith a signed creation_time.Modify authority discovery to keep track of nodes that sent us old record and once we are made aware of a new record update the nodes we know about with the new record.
Update gossip-support to try resolve authorities more often than every session.
This would gives us a lot more chances for the nodes in the networks to also discover not only the old address of the node but also the new one and should improve the time it takes for a node to be properly connected in the network. The behaviour won't be deterministic because there is no guarantee the all nodes will see the new record at least once, since they could query only nodes that have the old one.TODO