Skip to content

[BUG] - CSJ is always on, regardless of the GSM state #1490

@nfrisby

Description

@nfrisby

The Genesis specification declares that all Genesis components, including CSJ, should be disabled when the node finishes syncing.

The CSJ optimization is not influenced at all by the GSM state, so that's a bug.

It means that a Genesis-enabled node that finishes syncing will still only allow a single peer to run normal ChainSync at a time. If none of its peers are trying to attack this bug, then they'll all one-by-one be activated by CSJ (as its "dynamo"), send MsgAwaitReply soon enough, and so disengage from CSJ. In that case, this bug merely slightly prolongs the transition from syncing to caught-up and then also introduces a similar brief delay for any upstream peer that is connected to afterwards.

However, if an upstream peer behaves adversarially, by putting off sending MsgAwaitReply for as long as they can, then the bug becomes much more severe. Adversaries can choose that behavior while the victim is syncing, but Genesis's Limit on Patience (LoP) strictly bounds how long they can do it for. The LoP, however, is correctly disabled when the GSM transitions to CaughtUp. In the absence of the LoP, the adversarial peer can avoid sending MsgAwaitReply for a long time, up to several hours. During that time, any new upstream peers will be prevented by CSJ from syncing any headers. Any upstream peers that existed before the victim connected to the adversary are unaffected (because they already disengaged from CSJ). Because the Diffusion Layer churns every 30 minutes or so, a clever adversary might be able to effectively eclipse the victim. They won't be able to feed it an alternative chain, though---they can just ensure it sees the fresh honest chain's blocks much later than it should have.

The fix is simply to disable CSJ when the GSM transitions to CaughtUp, and also to re-enable CSJ when the GSM transitions back out of CaughtUp. The forward transition is trivial, since the GSM only transitions when all peers have sent MsgAwaitReply (and so are disengaged). The backwards transition, though, will require a small amount of new logic for CSJ to be able to re-engage a peer that it had previously disengaged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions