Reactive syncing metrics by nazar-pc · Pull Request #5410 · paritytech/polkadot-sdk

nazar-pc · 2024-08-19T15:06:54Z

This PR untangles syncing metrics and makes them reactive, the way metrics are supposed to be in general.

Syncing metrics were bundled in a way that caused coupling across multiple layers: justifications metrics were defined and managed by ChainSync, but only updated periodically on tick in SyncingEngine, while actual values were queried from ExtraRequests. This convoluted architecture was hard to follow when I was looking into #5333.

Now metrics that correspond to each component are owned by that component and updated as changes are made instead of on tick every 1100ms.

This does add some annoying boilerplate that is a bit harder to maintain, but it separates metrics more nicely and if someone queries them more frequently will give arbitrary resolution. Since metrics updates are just atomic operations I do not expect any performance impact of these changes.

Will add prdoc if changes look good otherwise.

P.S. I noticed that importing requests (and corresponding metrics) were not cleared ever since corresponding code was introduced in dc41558#r145518721 and I left it as is to not change the behavior, but it might be something worth fixing.

cc @dmitry-markin

…task of chain sync and make them reactive

…ngine reactive

dmitry-markin

Looks good, thank you! With updating metrics on every operation it's more difficult to check every update is handled. Hope you have searched through the code and made sure everything is covered.

dmitry-markin · 2024-08-20T10:29:49Z

substrate/client/network/sync/src/justification_requests.rs

+				metrics.pending.inc();
+			}
 		} else {
 			trace!(target: LOG_TARGET,


This is out of scope of this PR, but it looks like we can legitimately hit this if the response was delivered after the peer disconnected event arrived. This can happen depending on the order different protocols/streams are polled as these events are emitted by different objects (notifications protocol for sync peers and request-response protocol for responses).

Created an issue for this: #5414.

substrate/client/network/sync/src/justification_requests.rs

nazar-pc

Hope you have searched through the code and made sure everything is covered.

I checked all occurrences and invariants and I don't think I missed anything.

substrate/client/network/sync/src/justification_requests.rs

lexnv · 2024-08-21T15:30:33Z

substrate/client/network/sync/src/justification_requests.rs

 		if let Some(request) = self.active_requests.remove(&who) {
+			if let Some(metrics) = &self.metrics {
+				metrics.active.dec();
+			}


Would grouping the request hashmaps (or vec) with metrics help here?
Some time in the future we might do a active_requests.remove() and forget to decrement the appropriate metric.

One suggestion might be to group them into a wrapper:

struct ActiveRequestsMetered { inner: HashMap<..>, // Current self.active_requests metrics: Option<Metrics>, }

Maybe an even simpler approach would be to introduce a fn active_requests_remove() { ...; metrics.active.dec(); }.
Could also be a followup if would take too long to implement :D

There are several of these in the code and I'm not sure how much it would help with ergonomics to be completely honest. Metrics are one of those things that are just inherently invasive unfortunately.

lexnv

LGTM! Thanks for contributing!

…rics # Conflicts: # substrate/client/network/sync/src/strategy.rs

nazar-pc · 2024-08-23T11:56:48Z

I resolved minor merge conflict, but also had to restore useless tick. Without this tick tests like sync::multiple_requests_are_accepted_as_long_as_they_are_not_fulfilled never finish. I didn't figure out yet what is not being polled correctly without the tick, but something just hangs in syncing engine and is never able to make it past this:

polkadot-sdk/substrate/client/network/test/src/sync.rs

Lines 1020 to 1024 in 9fecd89

    
           if net.peer(1).client().justifications(hashof10).unwrap() != 
        
           	Some(Justifications::from((*b"FRNK", Vec::new()))) 
        
           { 
        
           	return Poll::Pending 
        
           }

If someone with more knowledge in this area can take a look at it that'd be great, but it shouldn't block this PR anymore.

nazar-pc · 2024-08-23T12:40:18Z

CI seems to be good now

* master: (36 commits) Bump the ci_dependencies group across 1 directory with 2 updates (#5401) Remove deprecated calls in cumulus-parachain-system (#5439) Make the PR template a default for new PRs (#5462) Only log the propagating transactions when they are not empty (#5424) [CI] Fix SemVer check base commit (#5361) Sync status refactoring (#5450) Add build options to the srtool build step (#4956) `MaybeConsideration` extension trait for `Consideration` (#5384) Skip slot before creating inherent data providers during major sync (#5344) Add symlinks for code of conduct and contribution guidelines (#5447) pallet-collator-selection: correctly register weight in `new_session` (#5430) Derive `Clone` on `EncodableOpaqueLeaf` (#5442) Moving `Find FAIL-CI` check to GHA (#5377) Remove panic, as proof is invalid. (#5427) Reactive syncing metrics (#5410) [bridges] Prune messages from confirmation tx body, not from the on_idle (#5006) Change the chain to Rococo in the parachain template Zombienet config (#5279) Improve the appearance of crates on `crates.io` (#5243) Add initial version of `pallet_revive` (#5293) Update OpenZeppelin template documentation (#5398) ...

nazar-pc added 2 commits August 19, 2024 17:56

Embed justification metrics into ExtraRequests instead of periodic …

9c6fb22

…task of chain sync and make them reactive

Remove periodic tick and make all metrics in chain sync and syncing e…

9135bb1

…ngine reactive

nazar-pc force-pushed the reactive-sync-metrics branch from e8fde12 to 9135bb1 Compare August 19, 2024 16:33

lexnv self-requested a review August 19, 2024 16:41

dmitry-markin approved these changes Aug 20, 2024

View reviewed changes

nazar-pc added 2 commits August 20, 2024 14:32

Fix incorrect metric increase

68080ca

Add prdoc

1c49fe7

nazar-pc commented Aug 20, 2024

View reviewed changes

substrate/client/network/sync/src/justification_requests.rs Show resolved Hide resolved

Merge branch 'master' into reactive-sync-metrics

0225a2e

dmitry-markin added the T0-node This PR/Issue is related to the topic “node”. label Aug 20, 2024

dmitry-markin approved these changes Aug 20, 2024

View reviewed changes

nazar-pc mentioned this pull request Aug 20, 2024

Make SyncingStrategy abstract and allow developers to customize it #5333

Closed

2 tasks

lexnv reviewed Aug 21, 2024

View reviewed changes

lexnv approved these changes Aug 21, 2024

View reviewed changes

nazar-pc and others added 4 commits August 21, 2024 18:34

Merge branch 'master' into reactive-sync-metrics

bb7e26d

Merge branch 'master' into reactive-sync-metrics

9b46fd8

Merge remote-tracking branch 'upstream/master' into reactive-sync-met…

18b3def

…rics # Conflicts: # substrate/client/network/sync/src/strategy.rs

Fix tests by reverting tick temporarily

2b6e580

dmitry-markin added this pull request to the merge queue Aug 23, 2024

Merged via the queue into paritytech:master with commit 4057ccd Aug 23, 2024

nazar-pc deleted the reactive-sync-metrics branch August 23, 2024 13:40

dmitry-markin mentioned this pull request Aug 23, 2024

sync: Remove periodic tick from SyncingEngine #5451

Open

ggwpez mentioned this pull request Sep 9, 2024

Parent issue for stable2409 LTS release #5583

Closed

ndkazu mentioned this pull request Oct 8, 2024

Remove periodic tick from SyncingEngine #5962

Open

github-actions bot mentioned this pull request Oct 8, 2024

Update polkadot-sdk from stable2407 to stable2409 moonbeam-foundation/moonbeam#2994

Closed

Aideepakchaudhary mentioned this pull request Oct 23, 2024

Feat: Polkadot stable2407 to stable2409 upgrade Cerebellum-Network/blockchain-node#456

Closed

github-actions bot mentioned this pull request Dec 5, 2024

Update polkadot-sdk from stable2407 to stable2409-2 moonbeam-foundation/moonbeam#3075

Closed

ashutoshvarma mentioned this pull request Apr 23, 2025

[Uplift] polkadot-sdk stable2409 uplift AstarNetwork/Astar#1404

Closed

pgherveou added this to [preview] release tracker Mar 20, 2026

github-project-automation bot moved this from Todo to Done in [preview] release tracker Mar 20, 2026

github-project-automation bot moved this to Todo in [preview] release tracker Mar 20, 2026

pgherveou removed this from [preview] release tracker Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reactive syncing metrics#5410

Reactive syncing metrics#5410
dmitry-markin merged 9 commits intoparitytech:masterfrom
nazar-pc:reactive-sync-metrics

nazar-pc commented Aug 19, 2024

Uh oh!

dmitry-markin left a comment

Uh oh!

dmitry-markin Aug 20, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

nazar-pc left a comment

Uh oh!

Uh oh!

lexnv Aug 21, 2024

Uh oh!

nazar-pc Aug 21, 2024

Uh oh!

lexnv left a comment

Uh oh!

nazar-pc commented Aug 23, 2024

Uh oh!

nazar-pc commented Aug 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nazar-pc commented Aug 19, 2024

Uh oh!

dmitry-markin left a comment

Choose a reason for hiding this comment

Uh oh!

dmitry-markin Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nazar-pc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lexnv Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

nazar-pc Aug 21, 2024

Choose a reason for hiding this comment

Uh oh!

lexnv left a comment

Choose a reason for hiding this comment

Uh oh!

nazar-pc commented Aug 23, 2024

Uh oh!

nazar-pc commented Aug 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dmitry-markin Aug 20, 2024 •

edited

Loading