Skip to content

[Manta] Auto-evict low-performing collators each session#358

Merged
stechu merged 3 commits intomantafrom
garandor/collator-ejection
Mar 19, 2022
Merged

[Manta] Auto-evict low-performing collators each session#358
stechu merged 3 commits intomantafrom
garandor/collator-ejection

Conversation

@Garandor
Copy link
Copy Markdown
Contributor

@Garandor Garandor commented Jan 20, 2022

Description

This PR modifies the manta-collator-selection pallet by adding three new storage items and corresponding setter extrinsics, that - if set to anything other than their default - automatically remove collators at the end of a session that significantly ( x% ) underperformed the y-th percentile of the total collator set over the current session.

A good initial value (and we intend to set this on Dolphin) is

  • x = 10 ( default value on genesis: 100)
  • y = 80 ( default value on genesis: 0)

Main changes in: pallets/collator-selection/src/lib.rs

  • Substitutes the kick_threshold and last_block_authored storage items with two new storage items, defining the percentile and the percentage above
  • Adds a setter extrinsic for each of the new items
  • Adds a storage map that records the number of blocks produced by each collator in the current session

closes: #335


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (manta or dolphin) with right title (start with [Manta] or [Dolphin]),
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests.
  • Updated relevant documentation in the code.
  • Re-reviewed Files changed in the Github PR explorer.
  • If runtime changes, need to update the version numbers properly:
    • authoring_version: The version of the authorship interface. An authoring node will not attempt to author blocks unless this is equal to its native runtime.
    • spec_version: The version of the runtime specification. A full node will not attempt to use its native runtime in substitute for the on-chain Wasm runtime unless all of spec_name, spec_version, and authoring_version are the same between Wasm and native.
    • impl_version: The version of the implementation of the specification. Nodes are free to ignore this; it serves only as an indication that the code is different; as long as the other two versions are the same then while the actual code may be different, it is nonetheless required to do the same thing. Non-consensus-breaking optimizations are about the only changes that could be made which would result in only the impl_version changing.
    • transaction_version: The version of the extrinsics interface. This number must be updated in the following circumstances: extrinsic parameters (number, order, or types) have been changed; extrinsics or pallets have been removed; or the pallet order in the construct_runtime! macro or extrinsic order in a pallet has been changed. If this number is updated, then the spec_version must also be updated
  • [ ] If needed, notify the committer this is a draft-release and a tag is needed after merging the PR.
  • Verify benchmarks & weights have been updated for any modified runtime logics
  • [ ] If needed, bump version for every crate.
  • If import a new pallet, choose a proper module index for it, and allow it in BaseFilter. Ensure every extrinsic works from front-end. If there's corresponding tool, ensure both work for each other.
  • [ ] If needed, update our Javascript/Typescript APIs. These APIs are offcially used by exchanges or community developers.
  • [ ] If we're going to issue a new release, freeze the code one week early(it depends, but usually it's one week), ensure we have enough time for related testing.

@Garandor Garandor changed the title Garandor/collator ejection [Manta] Auto-evict low-performing collators each session Jan 20, 2022
@Garandor Garandor force-pushed the garandor/collator-ejection branch from 53ba36c to 72f89bb Compare February 1, 2022 17:15
@Garandor
Copy link
Copy Markdown
Contributor Author

Garandor commented Feb 7, 2022

This PR is functional but not fully ready - i have added comments where I still have open questions myself - please have a look and advise how to best resolve them

@Garandor Garandor requested a review from stechu February 7, 2022 09:53
@Garandor Garandor self-assigned this Feb 7, 2022
@Garandor Garandor added this to the v3.1.5 milestone Feb 7, 2022
@Garandor Garandor marked this pull request as ready for review February 11, 2022 02:37
@Garandor
Copy link
Copy Markdown
Contributor Author

Garandor commented Feb 11, 2022

Only one unresolved open question left and that one is strictly optional to implement.

This is now ready for final review

P.S. I'll see on the side if i can get rid of the pulled in ValidatorSet trait as an optional refactor

Copy link
Copy Markdown
Contributor

@ghzlatarev ghzlatarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Garandor The CI is failing.

@Garandor Garandor marked this pull request as draft February 15, 2022 15:49
@Garandor
Copy link
Copy Markdown
Contributor Author

Reworking this to add extrinsics to change the percentile and threshold.
Ended the day yesterday with benchmarking problems

@Garandor
Copy link
Copy Markdown
Contributor Author

Kicking config is now updateable and disabled by default on genesis.

The verify step of the benchmark is still failing (https://github.com/Manta-Network/Manta/runs/5211310990?check_suite_focus=true#step:9:1687), looking into it, but the code should now be fairly stable.

@ghzlatarev @bhgomes

@Garandor Garandor marked this pull request as ready for review February 16, 2022 09:12
@Garandor Garandor marked this pull request as draft February 25, 2022 19:46
@Garandor
Copy link
Copy Markdown
Contributor Author

Garandor commented Feb 25, 2022

Do Not Merge until resolution of

https://substrate.stackexchange.com/questions/504/reclaim-used-space-of-a-removed-storageitem-on-a-live-chain

Resolution came quick - A migration to kill the old storage item is needed

@Garandor
Copy link
Copy Markdown
Contributor Author

Garandor commented Feb 27, 2022

@stechu Worked in your comments.
As for adding #transactional, I think the two set_* extrinsics really don't have the complexity to warrant adding it.
Parity also hasnt marked set_candidacy_bond transactional: https://paritytech.github.io/cumulus/src/pallet_collator_selection/lib.rs.html#335

This stil is not ready to merge as it needs a storage migration to clear the removed storage item

@Garandor
Copy link
Copy Markdown
Contributor Author

Tentatively final version of this PR:

  • Calamari is set in Cargo.toml to use the last pre-merge version of manta_collator_selection on manta branch.
    No migration is needed as it continues to use the old KickThreshold-based eviction mechamism.

NOTE: This will go well for a while, until we upgrade substrate dependencies to v0.9.17, as this hard-set rev of manta_collator_selection can't easily have its deps updated along with the rest of the code.
When we do decide to upgrade calamari to the new mechanism, a migration is needed to remove the following storage items from the chain.

  1. LastAuthoredBlock
  2. KickThreshold
  • Dolphin is set to use the pallet from local filesystem
    No migration is needed as the chain/testnet has not yet been launched

@Garandor Garandor marked this pull request as ready for review February 28, 2022 22:42
@Garandor Garandor marked this pull request as draft February 28, 2022 22:54
@Garandor
Copy link
Copy Markdown
Contributor Author

Garandor commented Feb 28, 2022

updated weights from CI benchmark missing, don't merge yet

@Garandor
Copy link
Copy Markdown
Contributor Author

Garandor commented Mar 2, 2022

Do not merge until resoution of #437

After #437 is on manta:

@Garandor Garandor marked this pull request as draft March 2, 2022 22:05
commit e25feeb
Merge: 331e6bd 52812c6
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Mar 3 15:20:51 2022 -0600

    Merge branch 'manta' into garandor/collator-ejection

commit 331e6bd
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 17:57:24 2022 -0600

    Add CI-benchmark weights for calamari runtime on the old manta_collator_selection pallet

commit d286579
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 15:18:38 2022 -0600

    fixup

commit 44fdf9e
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 15:08:13 2022 -0600

    Readd KickThreshold to mari config

commit 8369032
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 15:01:49 2022 -0600

    fixup: remove AccountIdOf from mari

commit d8b4c52
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 14:59:44 2022 -0600

    fixup: remove extrinsics from calamari weights file too

commit fbf4061
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 14:49:26 2022 -0600

    readd old r parameter to dummy weights ( to be recomputed from CI )

commit 072b55a
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 14:47:24 2022 -0600

    Mari: The old pallet version ofcourse doesn't have the two new extrinsics

commit 0482632
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 14:29:26 2022 -0600

    fixup: same for dolphin

commit 142951f
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 14:26:16 2022 -0600

    cumulusXcm has no callables to filter

commit 744adbf
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 14:04:45 2022 -0600

    fixup: add pallet to dolphin weights

commit 1658421
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 13:57:05 2022 -0600

    Staple Calamari to pre-rework revision of the manta_collator_selection pallet

commit 3a9fe9b
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 13:19:15 2022 -0600

    Align Dolphin with modified Calamari RT  wrt. to collator_selection

commit b4355af
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 28 12:24:22 2022 -0600

    Add set_eviction_baseline&tolerance to BaseFilter on Calamari

commit 5bba63f
Author: Adam Reif <Garandor@manta.network>
Date:   Sat Feb 26 18:27:07 2022 -0600

    Rename kick_stale_candidates -> evict_bad_collators

commit 156efc6
Author: Adam Reif <Garandor@manta.network>
Date:   Sat Feb 26 18:12:16 2022 -0600

    minor cleanup and format

commit 6ad7329
Author: Adam Reif <Garandor@manta.network>
Date:   Sat Feb 26 18:11:10 2022 -0600

    Rename Threshold -> Tolerance & Percentile -> Baseline

commit dcd1a6a
Author: Adam Reif <Garandor@manta.network>
Date:   Sat Feb 26 17:44:08 2022 -0600

    kick_stale_signatures() returns Vec<> instead of Option<Vec<>>

commit 84844cb
Merge: 51f5462 bdfb868
Author: Adam Reif <Garandor@manta.network>
Date:   Fri Feb 25 12:20:10 2022 -0600

    Merge remote-tracking branch 'origin/manta' into garandor/collator-ejection

commit 51f5462
Merge: 516842a 13dfd01
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 24 09:12:54 2022 -0600

    Merge remote-tracking branch 'origin/manta' into garandor/collator-ejection

commit 516842a
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 24 09:12:05 2022 -0600

    Add CI-benchmarked weights for collator selection

commit 53b8d4a
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 23 22:20:38 2022 -0600

    fixup

commit 871af21
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 23 22:20:17 2022 -0600

    Set benchmark worst-case as removing all but one candidate

commit 1c5a9e1
Merge: 272be77 2c317cd
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 22 21:32:45 2022 -0600

    Merge remote-tracking branch 'origin/manta' into garandor/collator-ejection

commit 272be77
Author: Adam Reif <Garandor@manta.network>
Date:   Fri Feb 18 18:27:36 2022 -0600

    Changelog

commit 5c2189e
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 16 19:27:22 2022 -0600

    Reduce session performance trace loglevel to prevent irrelevant logspam

commit fe7f947
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 16 14:11:53 2022 -0600

    Minor Syntax refactor

commit 53186a5
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 16 13:54:52 2022 -0600

    Use EvictionPercentile/EvictionThreshold for brevity, better doccomments

commit 8b2d03e
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 16 13:28:38 2022 -0600

    Typo in trait >_>

commit ea658d5
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 16 03:29:46 2022 -0600

    remove mistakenly committed println!() invocations

commit b516007
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 16 02:51:10 2022 -0600

    fix bug to not kick when perf exactly at threshold. Add boundary condition unit tests

commit ea20c57
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 16 00:40:06 2022 -0600

    undo formatting-only changes in benchmark to minimize the diff

commit 610ef19
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 15 23:42:37 2022 -0600

    Add sudo/gov-protected extrinsics to mutate percentile/threshold (weighting TODO)

commit f43dc3f
Merge: 726606e c9d3bdf
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 15 22:45:13 2022 -0600

    Merge v9.16 changes from manta, adapt parity unit test

commit 726606e
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 14 15:47:08 2022 -0600

    retype u8 -> Percent for PerformancePercentileToConsiderForKick and UnderperformPercentileByPercentToKick

commit 2a3043d
Author: Adam Reif <Garandor@manta.network>
Date:   Fri Feb 11 10:22:46 2022 -0600

    minor comment

commit 1c65485
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 21:05:18 2022 -0600

    whitespace cargo fmt

commit 18f2384
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 21:00:41 2022 -0600

    minor: trailing whitespace

commit 7653b7f
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 20:32:23 2022 -0600

    pull left_from_one in scope

commit 2edca0d
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 20:13:18 2022 -0600

    Minor: Remove code comments

commit 3c86881
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 19:52:05 2022 -0600

    align Dolphin RT with Calamari regarding modified collator_selection pallet

commit b719eca
Merge: 3d56561 5f63e7b
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 19:34:30 2022 -0600

    Merge dolphin RT

commit 3d56561
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 19:21:25 2022 -0600

    Don't use libm, use Percent fixed point type as per @jamie 's recommendation

commit 90797cf
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 18:55:36 2022 -0600

    Refactor kick_candidates with fewer local vars

commit 1b6949e
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 12:02:31 2022 -0600

    resolve comment

commit 341b9e9
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 10 11:32:45 2022 -0600

    kick_stale_collators now returns Option<kicked_candidates>

commit 551884f
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 9 14:30:01 2022 -0600

    cleanup last_authored_block remnants

commit 35f86bf
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 9 14:24:21 2022 -0600

    mutate instead of get/insert

commit 7f86bad
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 9 12:33:35 2022 -0600

    fixup: debug AcccountId instead of display

commit 4e51b30
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Feb 9 12:00:48 2022 -0600

    Use f64 and libm::ceil(), log perf of kicked collator

commit e98ade0
Merge: c0af847 9fa2d46
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 8 12:36:33 2022 -0600

    Merge v3.1.3 manta

commit c0af847
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 8 11:11:37 2022 -0600

    minor rename

commit ad35c2e
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 8 10:55:07 2022 -0600

    fix testing

    Setting block performance to 10 on session start led to node 4 having produced 20 blocks at session end, which would kick everyone if there were 4 or fewer total validators active this session ( due to how the percentile works )

commit 225cfde
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 7 14:13:10 2022 -0600

    don't include FloatCore, call it directly

commit 6f8beb3
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Feb 7 03:30:46 2022 -0600

    Fix ordinal to round up the calculated float, some formatting

commit 52b2da6
Merge: 33fc948 3ba51e0
Author: Adam Reif <Garandor@manta.network>
Date:   Fri Feb 4 09:08:02 2022 -0600

    Merge branch 'manta' into garandor/collator-ejection

commit 33fc948
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 3 13:33:11 2022 -0600

    incorporate first review comments

commit e698a9e
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 3 11:37:10 2022 -0600

    convert block num

commit 0939168
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 3 11:03:47 2022 -0600

    fix syntax

commit 02fac39
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 3 10:56:05 2022 -0600

    fix type issues

commit f6c10cc
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 3 10:05:09 2022 -0600

    first attempt at benchmarking

commit c64d4a0
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 3 10:00:03 2022 -0600

    dont use new format string syntax pt2

commit 0bf1275
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Feb 3 09:16:36 2022 -0600

    dont use new format string syntax

commit f4c3b8d
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 1 11:27:32 2022 -0600

    nvm, manta RT doesnt use the forked CS yet

commit 2cbfb07
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 1 11:20:34 2022 -0600

    add AccountIdOf to runtimes

commit 72f89bb
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 1 11:06:50 2022 -0600

    Cleanup code comments

commit e62b90e
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Feb 1 10:30:45 2022 -0600

    Add testcases, resign to the fact that collator_selection tests are written expecting node 4 to author everything

commit 1afaa76
Author: Adam Reif <Garandor@manta.network>
Date:   Sun Jan 30 15:26:11 2022 -0600

    fix validatorID conversion

commit 36d1de7
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Jan 25 23:36:23 2022 -0600

    Attempt to implement ValidatorId -> AccountId conversion

commit 5b3875b
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Jan 25 23:02:43 2022 -0600

    Implement ValidatorSet for lookup of active validators

commit 3ce8bb9
Author: Adam Reif <Garandor@manta.network>
Date:   Tue Jan 25 11:26:45 2022 -0600

    readd kick_mechanism check with session counts

commit 5e6400c
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Jan 24 21:46:37 2022 -0600

    Refactor kick_stale_candidates to report removed collators instead of active ones

commit a6c9038
Author: Adam Reif <Garandor@manta.network>
Date:   Mon Jan 24 09:56:00 2022 -0600

    testing

commit ceab4c3
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Jan 20 00:13:38 2022 -0600

    remove last_authored_block from tests

commit e6753bd
Author: Adam Reif <Garandor@manta.network>
Date:   Thu Jan 20 00:13:11 2022 -0600

    flip assert logic

commit f5b39f2
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Jan 19 23:10:33 2022 -0600

    Flip safe/kick slice order, calculate on floats, not u32s

commit 5c30e5b
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Jan 19 21:58:19 2022 -0600

    Remove dependency on libmath, align percentile value to the number in #335

commit 60e9ccc
Author: Adam Reif <Garandor@manta.network>
Date:   Wed Jan 19 19:19:45 2022 -0600

    First compiling draft

Signed-off-by: Adam Reif <Garandor@manta.network>
@Garandor Garandor force-pushed the garandor/collator-ejection branch from e25feeb to d545e48 Compare March 7, 2022 21:01
@Garandor Garandor marked this pull request as ready for review March 10, 2022 16:41
Adam Reif added 2 commits March 14, 2022 13:26
Signed-off-by: Adam Reif <Garandor@manta.network>
Signed-off-by: Adam Reif <Garandor@manta.network>
@Garandor Garandor force-pushed the garandor/collator-ejection branch from ec0aaed to 7e52633 Compare March 14, 2022 18:28
Copy link
Copy Markdown
Contributor Author

@Garandor Garandor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished refactoring the test code @stechu
It should be easier to follow along now.

LGTM

@stechu stechu merged commit 4a3ea4f into manta Mar 19, 2022
Dengjianping added a commit that referenced this pull request Mar 30, 2022
Calamari: Integrate new collator eviction from #358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evicting Underperforming Collators

5 participants