Fix memory leaks by altonen · Pull Request #55 · paritytech/litep2p

altonen · 2024-03-10T10:40:00Z

The big memory leak was related to outbound backpressure. Notification connection handler would block on Substream::send_framed() and accumulated a lot of memory in the notification channel which apparently didn't get released appropriately, possibly due to connection handler not getting closed. The connection handler is rewritten in a way where it processes both inbound and outbound notifications without blocking on either one of them so if the node is trying to send a large notification/many notifications, it won't block the entire connection event loop and instead gradually writes the notification(s) to the substream. Size of the channel that's used to send notifications synchronously is adjusted to match the one in Polkadot SDK.

NotificationHandle::send_sync_notification() is modified to match the implementation currently found in Polkadot SDK whereby it closes the connection if the channel is clogged. I don't think this is a particularly good idea in the first place and once the protocols in Polkadot SDK have been written in such a way that they can deal with backpressure, e.g., by prioritizing certain notifications over others, this behavior can (and probably should) be removed.

Third change is the largest and that is to (temporarily) move yamux into litep2p. The flamegraphs indicated that yamux was consuming a lot of memory and while bumping it to 0.11.1, where the patch released fixed an issue with GC, helped, it was still leaking memory. The follow-up for the GC was fixed in 0.12.1 but 0.12.0 introduced significant API breakage, probably requiring a rewrite of the TCP and WebSocket connection handlers, which I'm not interested in doing at this time. I backported the memory leak fix from 0.12.1 to 0.11.1 and transferred the source code of yamux to litep2p and it improved things to a point where litep2p doesn't leak memory anymore. Moving yamux into litep2p is not optimal but considering that 0.12.0 removed the APIs that litep2p is relying on, thus requiring non-trivial rewrites of the connection handlers, and that litep2p is leaking too much memory on 0.11.1, it seems like the best approach for now. Yamux source code doesn't need a review and can be skipped.

Lastly, add timeouts for IPFS Identify, IPFS Ping, and request-response protocols. It may be possible for the read/write to block indefinitely which could cause the future to stay active which in turn would keep the connection open as each substream holds a permit, preventing the connection from getting closed. Now that each substream operation has a timeout, the connection won't be kept open erroneously, allowing resources to be released.

This is big PR but I've run a burn-in for these changes in Versi so it should be safe to merge this.

yamux `0.11.1` and `0.12.1` contains fixes for how its GC works and while upgrading to `0.11.1` would be trivial, the fix was not enough not enough to reduce memory leakage. `0.12.1`, on the other hand, contians significant API breakage and likely requires a rewrite of TCP/WebSocket connection handlers as `ControlledConnection` and `Control` were removed In the intersest of saving time, switch to `0.11.1`, backport the GC fix from `0.12.1` to `0.11.1`, and move source code of yamux to litep2p. This is by no means an ideal solution but considering that fixing the memory leaks are of highest importance and rewriting the connection handlers likely requires significant rewrite and extensive testing, it's easiest to just use yamux directly, like was done with multistream-select.

Rewrite connection handler to handle backpressure better by not blocking when sending notifications and instead allowing the connection handler to process inbound notifications while the outbound substream is blocked

Use the same size as in Polkadot SDK

To follow how Polkadot SDK's notification protocol implementation, if the outbound channel is clogged, close the connection.

Wait 10 seconds for reads/writes to complete after which the operation is canceled. Keeping the substream open without any time limit can keep the connection erroneously open because the substream holds a permit which keeps the connection open, preventing it from closing and releasing consumed memory.

src/protocol/libp2p/ping/mod.rs

src/protocol/notification/connection.rs

Co-authored-by: Dmitry Markin <[email protected]>

altonen added 7 commits March 9, 2024 13:12

protocol/notification: Rewrite notification connection handler

67c5a3c

Rewrite connection handler to handle backpressure better by not blocking when sending notifications and instead allowing the connection handler to process inbound notifications while the outbound substream is blocked

protocol/notification: Adjust SYNC_CHANNEL_SIZE

6b8f4a8

Use the same size as in Polkadot SDK

protocol/notification: Close connection if channel is clogged

ce8f1ba

To follow how Polkadot SDK's notification protocol implementation, if the outbound channel is clogged, close the connection.

protocol/libp2p/ping: Add timeouts for read/write operations

6a9e4d8

protocol/request-response: Add timeouts for read/write operations

c7082d8

altonen requested a review from dmitry-markin March 10, 2024 10:40

altonen force-pushed the altonen-fix-memory-leaks branch from 1108ce0 to c7082d8 Compare March 10, 2024 10:50

dmitry-markin approved these changes Mar 11, 2024

View reviewed changes

src/protocol/libp2p/ping/mod.rs Show resolved Hide resolved

src/protocol/notification/connection.rs Outdated Show resolved Hide resolved

Update src/protocol/notification/connection.rs

ea483f9

Co-authored-by: Dmitry Markin <[email protected]>

altonen merged commit cc043b6 into master Mar 11, 2024

altonen deleted the altonen-fix-memory-leaks branch March 11, 2024 16:29

altonen mentioned this pull request Nov 11, 2024

yamux: Update yamux with window tuning and backport fixes #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leaks#55

Fix memory leaks#55
altonen merged 8 commits intomasterfrom
altonen-fix-memory-leaks

altonen commented Mar 10, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

altonen commented Mar 10, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants