Skip to content

Conversation

@adamhass
Copy link
Collaborator

@adamhass adamhass commented Feb 2, 2021

  • Added a NetworkStatusPort which components may connect to to receive NetworkStatusUpdate indications or send NetworkStatusRequests to the Dispatcher.
  • Protocol to distinguish between gracefully closed channels and lost connections implemented.
  • Docs page added for the NetworkStatusPort.
  • Added public method SystemPath::socket_address() -> SocketAddr
  • Added public method System::kill_system() which kills a system without doing graceful shutdown of network channels.
  • Components can send two requests on the NetworkStatusPort: ConnectSystem(SystemPath) which allows systems to initiate connections without sending messages to actors on the system, for set-up phases and network-overlay-management. And DisconnectSystem(SystemPath) which will initiate graceful disconnection between the systems, no messages are discarded and new messages sent between the systems will automatically reconnect the systems.

Please make sure these boxes are checked, before submitting a new PR.

  • [✅ ] You ran rustfmt on the code base before submitting (on latest nightly with rustfmt support)
  • [✅] You reference which issue is being closed in the PR text (if applicable)

Issues

Closes #82

(old discussion points)

This is a draft

I'd like to have a discussion on the API and naming here.

  1. I'm not sure at all about using SocketAddrin the public interface.
  2. The CloseConnection() request immediately initiates a graceful shutdown of a connection and I'm not sure what should happen if there are enqueued messages, or future messages sent by either side.
  3. There are two indications which are undocumented and unused at the moment, I added them when I created the struct but I'm not sure if they'll be needed or if it's just bloat that probably won't be used . Maybe there's an event which should be added instead?
  4. MaxChannelsReached is one indication that is currently unused but this will be used soon. I'll make a separate PR for issue Open Channel Limit #83 soon.
  5. I drew the graceful shutdown sequence here if anyone is interested in understanding it, I may have overengineered this part somewhat but now its there and is used to reliably distinguish between closed and lost connections.

@adamhass adamhass added this to the 1.0.0 milestone Feb 2, 2021
@adamhass
Copy link
Collaborator Author

adamhass commented Feb 2, 2021

Most of the failing tests here (except for tests(nightly) fail because I use system.shutdown() to test for lost-connection behavior, which worked on my machine but the CI manages to go through with the shutdown gracefully and so the lost-connection behavior isn't triggered.

Should we expose a method like shutdown_now() that doesn't attempt graceful shutdowns and instead just closes everything immediately?

@adamhass
Copy link
Collaborator Author

adamhass commented Feb 3, 2021

After a discussion with Harald the following changes have been made to this draft:

  1. Changed SocketAddr to SystemPath in the Status messages.
  2. CloseChannel: All already enqueued messages will be sent over the connection before it is finally closed, only new messages will be discarded.
  3. Added an OpenChannel request message, will open a connection, even if it was previously closed by CloseConnection. It is the only way to re-estalish connections that were closed by such requests.

The new docs-page has been updated to reflect these changes.


There is an issue failing our current nightly tests caused by this change phil-opp/blog_os#921

The nightly which it is included in is missing fmt right now so I'm thinking I'll just wait a few days and see if things converge a bit before I look into it further.

@Bathtor
Copy link
Contributor

Bathtor commented Feb 7, 2021

After a discussion with Harald the following changes have been made to this draft:

  1. Changed SocketAddr to SystemPath in the Status messages.

I think this is good, yes.

  1. CloseChannel: All already enqueued messages will be sent over the connection before it is finally closed, only new messages will be discarded.
  2. Added an OpenChannel request message, will open a connection, even if it was previously closed by CloseConnection. It is the only way to re-estalish connections that were closed by such requests.

Hmm, I'm not so sure I'm into discarding messages here. CloseChannel, at least in my mind, was meant for P2P overlays where you are constantly switching your partners and you don't want to keep tons of channels open until you reach the limit. But in these cases the channels we are closing would be dead and we don't need to force them dead. Nothing should be sent on them anyway (modulo some weird timing, perhaps).
But with the current approach, if the overlay component closes a channel that some other component is still using, the other component is going to lose all its messages and it's probably never going to send an OpenChannel, since it doesn't know that the channel was closed.
I'd prefer it if, at least by default, no messages were discarded at all. We could add something like a keep_closed_for: Duration flag to the CloseChannel request, that allows users to customise the behaviour a bit. For example, if keep_closed_for = 500ms Then it will gracefully close the channel and prevent it from being reopened for 500ms, dropping messages during that time frame (preferably while logging them, so this stuff can be debugged later^^). But the default value for keep_closed_for should be 0 I think.
Another thing to consider is that even with a 0 duration, if are still closing the TCP session and almost immediately re-establishing it, we need to send affected a components a "session lost" indication or something, so they know what's up.

There is an issue failing our current nightly tests caused by this change phil-opp/blog_os#921

The nightly which it is included in is missing fmt right now so I'm thinking I'll just wait a few days and see if things converge a bit before I look into it further.

Just try to replace my usage of the constant initialisers with <array> = Default::default(), creating Default impls where needed. I think it should still work and it's better style anyway.

@Bathtor
Copy link
Contributor

Bathtor commented Feb 7, 2021

Should we expose a method like shutdown_now() that doesn't attempt graceful shutdowns and instead just closes everything immediately?

Sounds reasonable. Might even be valuable as a public API, in case someone is really short on time and can't wait for graceful behaviours.

@adamhass adamhass force-pushed the network_status branch 3 times, most recently from e452be1 to ce002a2 Compare February 15, 2021 13:28
connect_network_status_port method to connect components to it added to KompactSystem.
Protocol to distinguish between gracefully closed channels and lost connections implemented.
Docs page added for the NetworkStatusPort.
Public method added SystemPath::socket_address()
@adamhass adamhass force-pushed the network_status branch 2 times, most recently from 2f8cb86 to 9662805 Compare February 15, 2021 14:03
@adamhass adamhass marked this pull request as ready for review February 15, 2021 16:15
@codecov
Copy link

codecov bot commented Feb 16, 2021

Codecov Report

Merging #133 (ea269a2) into master (11c463a) will increase coverage by 0.88%.
The diff coverage is 83.15%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
+ Coverage   70.68%   71.57%   +0.88%     
==========================================
  Files          90       90              
  Lines       15844    16336     +492     
==========================================
+ Hits        11200    11693     +493     
+ Misses       4644     4643       -1     
Impacted Files Coverage Δ
core/src/actors/mod.rs 90.00% <ø> (ø)
core/src/lib.rs 91.34% <ø> (ø)
docs/examples/src/bin/dns_resolver.rs 0.00% <0.00%> (ø)
docs/examples/src/bin/serialisation.rs 0.00% <ø> (ø)
core/src/default_components.rs 87.69% <55.55%> (-1.97%) ⬇️
core/src/net/network_channel.rs 70.38% <62.85%> (-0.72%) ⬇️
core/src/dispatch/mod.rs 79.38% <65.38%> (+1.15%) ⬆️
core/src/net/mod.rs 75.34% <74.54%> (-0.08%) ⬇️
core/src/net/network_thread.rs 82.46% <77.33%> (+3.44%) ⬆️
core/src/runtime/system.rs 85.44% <94.59%> (+1.05%) ⬆️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 11c463a...ea269a2. Read the comment docs.

Copy link
Contributor

@Bathtor Bathtor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Just a bunch of small stuff to fix.

@adamhass
Copy link
Collaborator Author

adamhass commented Feb 23, 2021

Thanks for the review. I took care of the small stuff and managed to do away with the downcasts. The signature and annotated example in core/src/runtime/system.rs:656 is the most relevant change here. I believe this is as clean as it gets.

(checked the docs links and updated the docs to reflect the modified API)

Copy link
Contributor

@Bathtor Bathtor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@adamhass adamhass merged commit 1c6651c into kompics:master Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network Meta Port

2 participants