Skip to content

Catching panics #1389

@Stebalien

Description

@Stebalien

(moving a discussion from a private conversation to somewhere more public)

Libp2p performs quite a bit of complex parsing, which has occasionally lead to panics at runtime. When uncaught, these panics crash the entire node.

Proposal: Catch panics at "failure boundaries". E.g.:

  • If we have some form of "connection" worker, catch panics in the worker and kill the entire connection if the worker panics. Same for streams.
  • Catch panics in per-peer stream handlers, cleaning up all state related to the peer.
  • Catch panics in low-level parsing logic. Parsing tends to be pretty self-contained but also pretty error prone.
  • Stretch: Where possible, catch service-level panics, cancel all current requests, close all resources, and restart. But we do need to be a bit careful to not continue running in a corrupted state.

Metadata

Metadata

Labels

P3Low: Not priority right now

Type

No type

Projects

Status

🥞 Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions