Refactor Connection to a synchronous state machine#142
Refactor Connection to a synchronous state machine#142thomaseizinger merged 41 commits intolibp2p:masterfrom
Connection to a synchronous state machine#142Conversation
bbd360a to
e41e838
Compare
poll-style codeConnection to a synchronous state machine
|
I am sorry for the delay here. I won't get to reviewing this until after libp2p day.
In my eyes, that would be a great simplification. |
No worries.
Cool! I am thinking of making a separate PR that adds more tests so I can refactor with a bit more confidence. Once that is merged, we could perhaps also remove the control API in this PR? It would make some of the internals vastly simpler. Plus, anyone can build a control style API on top of one with multiple poll functions at any point. |
272ddf9 to
33c7683
Compare
9500fb2 to
598a0aa
Compare
598a0aa to
b54df4c
Compare
We can do this by having a centralised place to send messages and shoving them into this buffer in all other places.
To reduce the duplication we split out a test harness.
This allows us to have a proper place for our test-harness.
b54df4c to
a0ba23b
Compare
|
I've rebased on top of master to allow for an easier patch-by-patch review! |
|
Let me know if you disagree with the workspace structure. |
mxinden
left a comment
There was a problem hiding this comment.
Direction looks good to me. Couple of comments, nothing big.
In my eyes this pull request combines many unrelated changes. Thus ideally this would be split into many pull requests. That said, I think it is fine to proceed here, especially as this repository is not very active and thus conflicts are not probable.
| match self.socket.poll_flush_unpin(cx)? { | ||
| Poll::Ready(()) => {} | ||
| Poll::Pending => {} | ||
| } |
There was a problem hiding this comment.
I am wondering whether we should flush so early in the poll method, or whether it shouldn't be one of the last actions. Rational being that frequent flushing hurts performance, especially in case one can increase the batch instead.
Just a thought. Needs more thought and potentially data to back it up.
There was a problem hiding this comment.
I couldn't find any consistent performance improvement in my benchmarks when moving this block of code up or down.
However, this got me thinking: We do we communicate via channels between the connection and the stream for writing but we use shared buffers when reading? We could just as easily have a buffer of frames in Shared and wake the Connection whenever we write any frames to that. This would allow us to drain the buffer of all streams in one go, without having to receive individual frames over a channel.
There was a problem hiding this comment.
I couldn't find any consistent performance improvement in my benchmarks when moving this block of code up or down.
Thanks for testing. Let's keep as is.
However, this got me thinking: We do we communicate via channels between the connection and the stream for writing but we use shared buffers when reading? We could just as easily have a buffer of frames in
Sharedand wake theConnectionwhenever we write any frames to that. This would allow us to drain the buffer of all streams in one go, without having to receive individual frames over a channel.
I am decisive whether the connection should communicate with the stream via a channel or via plain Mutex and Waker. Whatever change we want to make, I think it should not happen within this pull request.
| /// A [`Future`] that gracefully closes the yamux connection. | ||
| #[must_use] | ||
| pub struct Closing<T> { | ||
| state: State, | ||
| control_receiver: mpsc::Receiver<ControlCommand>, | ||
| stream_receiver: mpsc::Receiver<StreamCommand>, | ||
| pending_frames: VecDeque<Frame<()>>, | ||
| socket: Fuse<frame::Io<T>>, | ||
| } |
There was a problem hiding this comment.
Question not suggestion: Why deliberately implement this as a state machine instead of a procedural async/await?
There was a problem hiding this comment.
So that it can be named without boxing it up.
There was a problem hiding this comment.
What would be the drawback of boxing it?
There was a problem hiding this comment.
What would be the drawback of boxing it?
- Performance
- We have to decide whether we add the
Sendbound. In the current design, we get to delete theYamuxLocalstuff inrust-libp2pbecause theSendbound is inferred.
I don't feel strongly about either but it felt like a nice improvement as I went along. Once we get "impl Trait in type-alias" in Rust at least the boxing would go away.
There was a problem hiding this comment.
What would be the drawback of boxing it?
* Performance
Is there any benchmark proving this? Is performance relevant when closing a connection?
We have to decide whether we add the
Sendbound. In the current design, we get to delete theYamuxLocalstuff inrust-libp2pbecause theSendbound is inferred.
The infectious-send-when-boxing problem is reason enough to not box in my eyes 👍
I agree that the size is not ideal. I did however find it quite difficult to refactor this piece-wise into something that can be merged independently without leaving master in a weird state from a design perspective. Best I could do was to make small commits but I don't think a particular subset of those is worth merging independently :) |
mxinden
left a comment
There was a problem hiding this comment.
It retains the Control API but layers it completely on top of the existing Connection. This allows us to do this refactoring without touching any of the tests. In a future step, we can port all the tests to the new poll-based API and potentially remove the Control API.
I suggest we deprecate or remove the Control API in a follow-up pull request. What do you think @thomaseizinger?
This is a large change potentially resulting in subtle changes in behavior breaking upper layers. What is the best strategy to test this patch in the wild? I suggest we ask community members to run this in their production environments. Should we cut an alpha release for it, or rather have them test based on a hash?
Yes, the tests need refactoring before we can remove it.
I'd suggest:
|
|
I've bumped the version and changelog. I am intending to merge this in the upcoming days. |
Sounds good to me.
👍 I can cut a release right after. |

This PR refactors
Connectionto a synchronous state machine.It retains the
ControlAPI but layers it completely on top of the existingConnection. This allows us to do this refactoring without touching any of the tests. In a future step, we can port all the tests to the newpoll-based API and potentially remove theControlAPI.All commits:
It should be possible to review this PR patch-by-patch. You may find it easier though to first have a look at the end-result by checking out the code and navigating around to see how things work now.
It would be great if we could merge #145 first. That would allow us to share a few bits between the two test suites.