-
Notifications
You must be signed in to change notification settings - Fork 653
Adding a new Deallocator component
#2759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Please sign your commits following these rules: $ git clone -b "wk8/network_rm" [email protected]:wk8/swarmkit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354530096
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -fAmending updates the existing PR. You DO NOT need to open a new one. |
manager/deallocator/deallocator.go
Outdated
| eventsChan, eventsChanCancel, err := store.ViewAndWatch(deallocator.store, | ||
| func(readTx store.ReadTx) error { | ||
| // look for services that are marked for deletion | ||
| // there's no index on the `PendingDelete` field, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no index on the
PendingDeletefield
can you plz clarify what this means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that there's no DB index on that field; so listing those which are marked for deletion requires iterating over them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. maybe clarify then:
there's no index on the PendingDelete field in the store
manager/deallocator/deallocator.go
Outdated
| // if in doubt, let's proceed to clean up the service anyway | ||
| // better to clean up resources that shouldn't be cleaned up yet | ||
| // than ending up with a service and some resources lost in limbo forever | ||
| return deallocator.deallocateService(ctx, service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may cause problems. Maybe its better to be safe and maybe retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it. Sure, I'm not opposed to re-trying, but that only delays the real problem: what happens after we're retried n times and keep failing? Do we never stop trying (in which case this could be dangerous at another level, namely resource consumption & performance)? Or do we still forget after a while?
manager/deallocator/deallocator.go
Outdated
| ignoreServiceID = &service.ID | ||
| } | ||
|
|
||
| // then all of its service-level resources, provided no other service uses them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provided no other service uses them
How do we check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the resourceDeallocator.servicesLocator callback is for
manager/deallocator/deallocator.go
Outdated
| events[1] = api.EventUpdateService{Service: &api.Service{PendingDelete: true}} | ||
| i := 2 | ||
| for _, resourceDeallocator := range resourceDeallocators { | ||
| events[i] = resourceDeallocator.event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify why we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure I understand the question correctly, but basically this also boils down to separating the resource-specific code from the core of the deallocator: each resource-specific file defines the API event it cares about, and this is just aggregating all that to feed it to store.ViewAndWatch below.
Please let me know if I misunderstood the question!
manager/deallocator/deallocator.go
Outdated
| ) | ||
|
|
||
| func registerServiceLevelResource(resourceDeallocator resourceDeallocator) { | ||
| resourceDeallocators[reflect.TypeOf(resourceDeallocator.event)] = resourceDeallocator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any major performance overheads, but I don't know i this is a good pattern. cc @dperny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tend to shy away from using reflection when I can, but I couldn't find any other way to achieve what I wanted here, namely: have different callbacks for different event types without hard-coding in this file all the event types and their callback - for the sake of cleanly separating the deallocator itself from the deallocation code for each type of resource.
But then I'd love to be hinted at a better way of doing this :) !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manager/deallocator/deallocator.go
Outdated
| ) | ||
|
|
||
| var ( | ||
| resourceDeallocators = make(map[reflect.Type]resourceDeallocator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do map[interface{}]resourceDeallocator ? @wk8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the idea of not making it a bit more specific (reflect.Type than just an empty interface) if we can?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is reflect.Type more specific than interface{} ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well yes, it's a well-defined type, as opposed to interface{} which matches anything, akin to C/C++'s void*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, but do any of those properties of a relfect.Type apply to this use-case?
Here's something I found about generics in golang. https://appliedgo.net/generics/
TLDR the recommendation from this article is to avoid reflection. Maybe use interface{} and type assertions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, interesting read.
The main argument against reflection here is performance (type safety is not a concern here, given the very controlled fashion in which it's used). And sure enough, it's slower than a case switch (see below). But at least that allows us to not hardcode in the package's main file the different resource types, and keep each resource cleanly separated from the deallocator's code.
If we wanted to keep that separation of concerns in the code, we could indeed use type assertion in a linked list of resource-specific implementations, but that might grow slow if we add a lot of resources (10+, according to the benchmark below).
A rough benchmark:
package foo
import (
"reflect"
"testing"
)
type Foo struct{}
type Bar struct{}
var a = []interface{}{Foo{}, Bar{}}
var foo = reflect.TypeOf(Foo{})
var bar = reflect.TypeOf(Bar{})
func BenchmarkTypeOf(b *testing.B) {
x, y := 0, 0
for n := 0; n < b.N; n++ {
for _, i := range a {
switch reflect.TypeOf(i) {
case foo:
x++
case bar:
y++
}
}
}
assertSuccess(b, x, y)
}
func BenchmarkSwitch(b *testing.B) {
x, y := 0, 0
for n := 0; n < b.N; n++ {
for _, i := range a {
switch i.(type) {
case Foo:
x++
case Bar:
y++
}
}
}
assertSuccess(b, x, y)
}
func BenchmarkTypeAssertion(b *testing.B) {
x, y := 0, 0
for n := 0; n < b.N; n++ {
for _, i := range a {
if _, isFoo := i.(Foo); isFoo {
x++
} else if _, isBar := i.(Bar); isBar {
y++
}
}
}
assertSuccess(b, x, y)
}
func assertSuccess(b *testing.B, x, y int) {
if x != y || x != b.N {
b.Error("shouldn't happen")
}
}
yields
BenchmarkTypeOf-8 100000000 20.9 ns/op
BenchmarkSwitch-8 1000000000 2.89 ns/op
BenchmarkTypeAssertion-8 2000000000 1.89 ns/op
so, happy to re-write as a list of type assertions if you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also, we only do a call to reflect.Type when there's an update to a network (or another service-level resource down the line), so the 10 ns price might be okay?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh boy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh lord this is over-engineered.
| type: TYPE_STRING | ||
| json_name: "phpNamespace" | ||
| } | ||
| field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from the recent protoc upgrade from #2750
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK lets rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing to rebase against. The protoc upgrade change did not change this file, but it should have: regenerating the proto files after the upgrade will included these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and sure, I could make it a separate PR, but is it really worth it?)
Codecov Report
@@ Coverage Diff @@
## master #2759 +/- ##
==========================================
+ Coverage 61.87% 61.95% +0.08%
==========================================
Files 136 137 +1
Lines 21926 22047 +121
==========================================
+ Hits 13566 13660 +94
- Misses 6896 6912 +16
- Partials 1464 1475 +11 |
|
instead of using reflection and writing a more general API, can we write a less-general API that requires some objects be hard-coded? we have absolute control over what components perform resource deallocation, so we can just bake them in. |
dperny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's over-engineered. This is what we'd build if we were providing this as a framework or something against arbitrary types, but the types we're actually acting on are constrained to a pretty small set.
|
|
||
| // for services that are shutting down, we keep track of how many | ||
| // tasks still exist for them | ||
| services map[string]*serviceWithTaskCounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why map service IDs to both the service object AND the count? what does keeping a pointer to the service object get us, as opposed to having a map[string]int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need the service itself at deallocation time. Saves a lookup. I guess we could do away with that, but seems like a pretty minor choice either way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anshulpundir : isn't saving a lookup a strong enough reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One extra lookup for each service delete is not that big a overhead. But I guess you could argue either way. I'll let you make the call.
manager/deallocator/deallocator.go
Outdated
|
|
||
| // now we also need to look at all existing service-level resources | ||
| // that are marked for deletion | ||
| for key, resourceDeallocator := range resourceDeallocators { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the more that i think about this... we ought to just do
allNetworks, err = store.FindNetworks(readTx, store.All)and then later on if we add new resources (i.e. volumes), we'll just add more code:
allVolumes, err := store.FindVolumes(readTx, store.All)This does lead to duplication, but it's much more clear. If we were ever going to have more than a handful of object types that needed to be managed by this, it would be different, but we almost certainly won't.
| // FIXME (jrouge): see the comment on TaskReaper.Stop() and see when to properly stop this | ||
| // plus unit test on this! | ||
| func (deallocator *Deallocator) Stop() { | ||
| close(deallocator.stopChan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you want to wrap this in a sync.Once to make calls to Stop idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that when I get around to actually using it, right now I'm not clear on this. That's what the comment right before it hints at. But good catch, keeping it in mind :)
| } | ||
|
|
||
| func (deallocator *Deallocator) processNewEvent(ctx context.Context, event events.Event) (bool, error) { | ||
| switch typedEvent := event.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol typedEvent is a way better name than swtich ev := event.(type) { (which is what we have in other places iirc)
| ran <- returnValue | ||
| close(ran) | ||
| }() | ||
| waitForDeallocatorEvent(t, deallocator, expectedUpdates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was wondering how you fixed the bug here. i believe you issue an event on notifyEventChan after the setup phase?
a better pattern than doing that would be to add a variable to the deallocator:
type Deallocator struct {
// other fields elided
started chan struct{}
}Then, after the deallocator has started, you do:
close(started)Finally, instead of passing expectedUpdates, you should just check if the db state is in the state you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly not sold on this one. Adds yet another field for no great reason, and checking whether the deallocator actually performed any DB operations is an additional check on top of just checking that the DB is in the state we expect (which we also do).
8a5e572 to
b02a280
Compare
|
LGTM now. |
api/objects.proto
Outdated
| // services, as well as all service-level resources, can only be deleted | ||
| // after all the service's containers have properly shut down | ||
| // when a user requests a deletion, we just flip this flag | ||
| // the deallocator will take it from there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll probably be helpful to briefly mention how the deallocator uses this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
api/objects.proto
Outdated
|
|
||
| // PendingDelete indicates that this service's deletion has been requested | ||
| // services, as well as all service-level resources, can only be deleted | ||
| // after all the service's containers have properly shut down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: all the => all of the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
api/objects.proto
Outdated
| // PendingDelete indicates that this service's deletion has been requested | ||
| // services, as well as all service-level resources, can only be deleted | ||
| // after all the service's containers have properly shut down | ||
| // when a user requests a deletion, we just flip this flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a period before this sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| // services, as well as all service-level resources, can only be deleted | ||
| // after all the service's containers have properly shut down | ||
| // when a user requests a deletion, we just flip this flag | ||
| // the deallocator will take it from there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
manager/deallocator/deallocator.go
Outdated
| "github.com/docker/swarmkit/manager/state/store" | ||
| ) | ||
|
|
||
| // A Deallocator waits for services to fully shutdown (ie no containers left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove 'A' at the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
manager/deallocator/deallocator.go
Outdated
| // user. | ||
| // | ||
| // NOTE: since networks are the only service-level resources as of now, | ||
| // it has been deemed over-engineered to have a clean generic way to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has been deemed over-engineered to have a clean generic way
yikes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed "clean" ;) didn't mean this comment that way
manager/deallocator/deallocator.go
Outdated
| // handle other types of service-level resources; if we ever start | ||
| // having more of those and thus want to reconsider this choice, it | ||
| // might be worth having a look at this archived branch, that does | ||
| // implements a way of separating the code for the deallocator itself |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implements => implement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| doneChan chan struct{} | ||
| } | ||
|
|
||
| type serviceWithTaskCounts struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| // Run starts the deallocator, which then starts cleaning up services | ||
| // and their resources when relevant (ie when no tasks still exist | ||
| // for a given service) | ||
| // This is a blocking function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for calling this out!
manager/deallocator/deallocator.go
Outdated
| eventsChan, eventsChanCancel, err := store.ViewAndWatch(deallocator.store, | ||
| func(readTx store.ReadTx) error { | ||
| // look for services that are marked for deletion | ||
| // there's no index on the `PendingDelete` field, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. maybe clarify then:
there's no index on the PendingDelete field in the store
manager/deallocator/deallocator.go
Outdated
| api.EventUpdateNetwork{}) | ||
|
|
||
| if err != nil { | ||
| // if we have an error here, we can't proceed any further, let the world burn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roflol. Perhaps say something less dramatic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| anyUpdated := false | ||
| // now let's populate our internal taskCounts | ||
| for _, service := range allServices { | ||
| if updated, _ := deallocator.processService(ctx, service); updated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment on what processService() does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| } | ||
| } | ||
|
|
||
| // now we just need to wait for events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what events are we waiting for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ones that were listed as args to store.ViewAndWatch just a few lines above? I'm not sure how to say that without being redundant?
| <-deallocator.doneChan | ||
| } | ||
|
|
||
| func (deallocator *Deallocator) notifyEventChan(updated bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| for { | ||
| select { | ||
| case event := <-eventsChan: | ||
| if updated, err := deallocator.processNewEvent(ctx, event); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment on what processNewEvent() returns and how its used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| } | ||
|
|
||
| func (deallocator *Deallocator) notifyEventChan(updated bool) { | ||
| if deallocator.eventChan != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this ever be called where the deallocator.eventChan is nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Added comments.
| // if in doubt, let's proceed to clean up the service anyway | ||
| // better to clean up resources that shouldn't be cleaned up yet | ||
| // than ending up with a service and some resources lost in limbo forever | ||
| return true, deallocator.deallocateService(ctx, service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea, especially because deletion can't be reversed. Also, I think its better to have a leaked service object than corruption (tasks running without a service object). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of this patch is to start asynchronously deleting services and their resources. That means that if errors occur when actually deleting, we can't just surface those to the user who requested the deletion in the first place, since their request have long since returned.
That leaves us with just a few options in my opinion:
- not actually deleting anything when we encounter an error (in which case it would be nice to give the user say a "request ID" they can use to check on the status of a delete they've requested, and also the ability to force a delete after the normal flow has errored out.
- always do a best-effort deletion; in which case, sure, we can end up aggressively deleting services and resources that might actually still be in use; but then that's the current behaviour anyway, so it's not worse - and the hope then (as now) is that the system eventually self recovers when tasks eventually do shut down anyway
- a refinement on 2 would be adding exponential retries instead, but that does strike me as over-engineered, given the low impact of an anticipated deletion, together with the very low likelihood of that happening in the 1st place (how often do store operations realistically fail, really? At leadership change?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point with 2 that this behavior is the same as the previous behavior. Looks good as you have it currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| return false, nil | ||
| } | ||
|
|
||
| func (deallocator *Deallocator) deallocateService(ctx context.Context, service *api.Service) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| } | ||
| for _, networkConfig := range networkConfigs { | ||
| if network := store.GetNetwork(tx, networkConfig.Target); network != nil { | ||
| deallocator.processNetwork(ctx, tx, network, ignoreServiceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you handle the return value from this function, failures etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2759 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should errors be at least logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both processNetwork and processService log errors
|
|
||
| // and deallocate networks that may be marked for deletion and aren't used any more | ||
| for _, network := range allNetworks { | ||
| if updated, _ := deallocator.processNetwork(ctx, nil, network, nil); updated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do errors need to be handled or just ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2759 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should errors be at least logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2759 (comment)
| return | ||
| } | ||
|
|
||
| updateFunc := func(t store.Tx) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments for updateFunc. Also do we really need a function object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do. This method can be called either with or without an open transaction, depending on whether it's done as part of a service cleanup or not. See how it's used in the if tx == nil ... else just below.
manager/deallocator/deallocator.go
Outdated
| return | ||
| } | ||
|
|
||
| // proceeds to deallocating a network if it's pending deletion and there no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: deallocating => deallocate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
|
||
| // proceeds to deallocating a network if it's pending deletion and there no | ||
| // longer are any services using it | ||
| func (deallocator *Deallocator) processNetwork(ctx context.Context, tx store.Tx, network *api.Network, ignoreServiceID *string) (updated bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments for each of the arguments accepted by the function. e.g. its not obvious what ignoreServiceID does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| return | ||
| } | ||
|
|
||
| func (deallocator *Deallocator) processNewEvent(ctx context.Context, event events.Event) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz add a comment. All functions should have comments unless its blatantly obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
api/objects.proto
Outdated
| // PendingDelete indicates that this network's deletion has been requested. | ||
| // Services, as well as all service-level resources, can only be deleted | ||
| // after all of the service's containers have properly shut down. | ||
| // When a user requests a deletion, we just flip this flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
When a user requests a deletion
When a user requests a deletion of this network
|
|
||
| // for services that are shutting down, we keep track of how many | ||
| // tasks still exist for them | ||
| services map[string]*serviceWithTaskCounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anshulpundir
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few minor comments. Looks good otherwise!
| // if in doubt, let's proceed to clean up the service anyway | ||
| // better to clean up resources that shouldn't be cleaned up yet | ||
| // than ending up with a service and some resources lost in limbo forever | ||
| return true, deallocator.deallocateService(ctx, service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point with 2 that this behavior is the same as the previous behavior. Looks good as you have it currently.
|
|
||
| // and deallocate networks that may be marked for deletion and aren't used any more | ||
| for _, network := range allNetworks { | ||
| if updated, _ := deallocator.processNetwork(ctx, nil, network, nil); updated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should errors be at least logged?
| } | ||
| for _, networkConfig := range networkConfigs { | ||
| if network := store.GetNetwork(tx, networkConfig.Target); network != nil { | ||
| deallocator.processNetwork(ctx, tx, network, ignoreServiceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should errors be at least logged?
| // proceeds to deallocate a network if it's pending deletion and there no | ||
| // longer are any services using it | ||
| // actually deletes the network if it's marked for deletion and no services are | ||
| // using it any more (or the only one using it has ID `ignoreServiceID`, if not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this, but I don't think its obvious to a first time reader. Why is it OK to deallocated the network when the only one using it has ID ignoreServiceID ?
f731bf7 to
b828d81
Compare
This patch adds a new manager/deallocator package. The deallocator's job is to take over deletion of services and service-level resources (as of right now, only networks, but e.g. volumes would also fall in the same category. The deallocator relies on the reaper correctly deleting tasks for services that are shutting down, and only when no tasks are left within a service does it then proceed to delete that service and its resources. This patch does not yet actually put that new component to use; that will be done in a future, separate patch, for the sake of easier reviews. Special care has been taken to separate the deallocator itself from each of the service-level resource implementation. Includes unit tests. Signed-off-by: Jean Rouge <[email protected]>
Changes included; - moby/swarmkit#2735 Assign secrets individually to each task - moby/swarmkit#2759 Adding a new `Deallocator` component - moby/swarmkit#2738 Add additional info for secret drivers - moby/swarmkit#2775 Increase grpc max recv message size - addresses moby#37941 - addresses moby#37997 - follow-up to moby#38103 Signed-off-by: Sebastiaan van Stijn <[email protected]>
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
Changes included; - moby/swarmkit#2735 Assign secrets individually to each task - moby/swarmkit#2759 Adding a new `Deallocator` component - moby/swarmkit#2738 Add additional info for secret drivers - moby/swarmkit#2775 Increase grpc max recv message size - addresses moby/moby#37941 - addresses moby/moby#37997 - follow-up to moby/moby#38103 Signed-off-by: Sebastiaan van Stijn <[email protected]> Upstream-commit: be3843c8c8fb30b4a604dae9d0dad3d393db717c Component: engine
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
Namely the `PendingDelete` fields added to both `Services` and `NetworkResources` in moby/swarmkit#2759 Also amended relevant tests to now wait for services to be fully shut down after removing them. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch allows users to choose to make removals of services and networks asynchronous, which means that they only get deleted once all the containers belonging to them or using them are safely shut down. **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. Part of this test was already reviewed in moby#2778, which subsequently got reverted following the discussion in moby/moby#38525. The main difference between this PR and moby#2778 is that asynchronous removal is an opt-in behaviour here instead of changing the existing behaviour. **- How to test it** Updated tests. **- Description for the changelog** Services & networks can be deleted asynchronously, in which case the request for their removal simply marks them for removal, and they are then actually deleted only when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch allows users to choose to make removals of services and networks asynchronous, which means that they only get deleted once all the containers belonging to them or using them are safely shut down. **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. Part of this test was already reviewed in moby#2778, which subsequently got reverted following the discussion in moby/moby#38525. The main difference between this PR and moby#2778 is that asynchronous removal is an opt-in behaviour here instead of changing the existing behaviour. **- How to test it** Updated tests. **- Description for the changelog** Services & networks can be deleted asynchronously, in which case the request for their removal simply marks them for removal, and they are then actually deleted only when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch allows users to choose to make removals of services and networks asynchronous, which means that they only get deleted once all the containers belonging to them or using them are safely shut down. **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. Part of this test was already reviewed in moby#2778, which subsequently got reverted following the discussion in moby/moby#38525. The main difference between this PR and moby#2778 is that asynchronous removal is an opt-in behaviour here instead of changing the existing behaviour. **- How to test it** Updated tests. **- Description for the changelog** Services & networks can be deleted asynchronously, in which case the request for their removal simply marks them for removal, and they are then actually deleted only when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
**- What I did** This patch allows users to choose to make removals of services and networks asynchronous, which means that they only get deleted once all the containers belonging to them or using them are safely shut down. **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. Part of this test was already reviewed in moby#2778, which subsequently got reverted following the discussion in moby/moby#38525. The main difference between this PR and moby#2778 is that asynchronous removal is an opt-in behaviour here instead of changing the existing behaviour. **- How to test it** Updated tests. **- Description for the changelog** Services & networks can be deleted asynchronously, in which case the request for their removal simply marks them for removal, and they are then actually deleted only when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>
Changes included; - moby/swarmkit#2735 Assign secrets individually to each task - moby/swarmkit#2759 Adding a new `Deallocator` component - moby/swarmkit#2738 Add additional info for secret drivers - moby/swarmkit#2775 Increase grpc max recv message size - addresses moby#37941 - addresses moby#37997 - follow-up to moby#38103 Signed-off-by: Sebastiaan van Stijn <[email protected]>
This patch adds a new
manager/deallocatorpackage.The deallocator's job is to take over deletion of services and service-level
resources (as of right now, only networks, but e.g. volumes would also fall in
the same category.
The deallocator relies on the reaper correctly deleting tasks for services that
are shutting down, and only when no tasks are left within a service does it
then proceed to delete that service and its resources.
This patch does not yet actually put that new component to use; that will be
done in a future, separate patch, for the sake of easier reviews.
Includes unit tests.