Adding a new `Deallocator` component #2759

wk8 · 2018-10-04T00:50:34Z

This patch adds a new manager/deallocator package.

The deallocator's job is to take over deletion of services and service-level
resources (as of right now, only networks, but e.g. volumes would also fall in
the same category.

The deallocator relies on the reaper correctly deleting tasks for services that
are shutting down, and only when no tasks are left within a service does it
then proceed to delete that service and its resources.

This patch does not yet actually put that new component to use; that will be
done in a future, separate patch, for the sake of easier reviews.

Includes unit tests.

GordonTheTurtle · 2018-10-04T00:50:36Z

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "wk8/network_rm" [email protected]:wk8/swarmkit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354530096
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

anshulpundir · 2018-10-05T21:27:40Z

manager/deallocator/deallocator.go

+	eventsChan, eventsChanCancel, err := store.ViewAndWatch(deallocator.store,
+		func(readTx store.ReadTx) error {
+			// look for services that are marked for deletion
+			// there's no index on the `PendingDelete` field,


there's no index on the PendingDelete field

can you plz clarify what this means?

I mean that there's no DB index on that field; so listing those which are marked for deletion requires iterating over them all.

cool. maybe clarify then:
there's no index on the PendingDelete field in the store

anshulpundir · 2018-10-05T21:30:08Z

manager/deallocator/deallocator.go

+		// if in doubt, let's proceed to clean up the service anyway
+		// better to clean up resources that shouldn't be cleaned up yet
+		// than ending up with a service and some resources lost in limbo forever
+		return deallocator.deallocateService(ctx, service)


this may cause problems. Maybe its better to be safe and maybe retry?

I thought about it. Sure, I'm not opposed to re-trying, but that only delays the real problem: what happens after we're retried n times and keep failing? Do we never stop trying (in which case this could be dangerous at another level, namely resource consumption & performance)? Or do we still forget after a while?

anshulpundir · 2018-10-05T21:31:29Z

manager/deallocator/deallocator.go

+			ignoreServiceID = &service.ID
+		}
+
+		// then all of its service-level resources, provided no other service uses them


provided no other service uses them

How do we check that?

That's what the resourceDeallocator.servicesLocator callback is for

anshulpundir · 2018-10-05T21:35:58Z

manager/deallocator/deallocator.go

+	events[1] = api.EventUpdateService{Service: &api.Service{PendingDelete: true}}
+	i := 2
+	for _, resourceDeallocator := range resourceDeallocators {
+		events[i] = resourceDeallocator.event


Can you please clarify why we need to do this?

I'm not 100% sure I understand the question correctly, but basically this also boils down to separating the resource-specific code from the core of the deallocator: each resource-specific file defines the API event it cares about, and this is just aggregating all that to feed it to store.ViewAndWatch below.

Please let me know if I misunderstood the question!

anshulpundir · 2018-10-05T21:40:56Z

manager/deallocator/deallocator.go

+)
+
+func registerServiceLevelResource(resourceDeallocator resourceDeallocator) {
+	resourceDeallocators[reflect.TypeOf(resourceDeallocator.event)] = resourceDeallocator


I don't think there's any major performance overheads, but I don't know i this is a good pattern. cc @dperny

I also tend to shy away from using reflection when I can, but I couldn't find any other way to achieve what I wanted here, namely: have different callbacks for different event types without hard-coding in this file all the event types and their callback - for the sake of cleanly separating the deallocator itself from the deallocation code for each type of resource.

But then I'd love to be hinted at a better way of doing this :) !

Based on the conversation above, interface{} might be the easiest and simplest way to do this for now. However, I'm not a golang expert, so I would love @dperny and @stevvooe 's opinion here

anshulpundir · 2018-10-08T21:51:57Z

manager/deallocator/deallocator.go

+)
+
+var (
+	resourceDeallocators = make(map[reflect.Type]resourceDeallocator)


can we do map[interface{}]resourceDeallocator ? @wk8

I'm not sure I understand the idea of not making it a bit more specific (reflect.Type than just an empty interface) if we can?

Is reflect.Type more specific than interface{} ?

Well yes, it's a well-defined type, as opposed to interface{} which matches anything, akin to C/C++'s void*?

Fair, but do any of those properties of a relfect.Type apply to this use-case?

Here's something I found about generics in golang. https://appliedgo.net/generics/
TLDR the recommendation from this article is to avoid reflection. Maybe use interface{} and type assertions?

Thanks, interesting read.

The main argument against reflection here is performance (type safety is not a concern here, given the very controlled fashion in which it's used). And sure enough, it's slower than a case switch (see below). But at least that allows us to not hardcode in the package's main file the different resource types, and keep each resource cleanly separated from the deallocator's code.

If we wanted to keep that separation of concerns in the code, we could indeed use type assertion in a linked list of resource-specific implementations, but that might grow slow if we add a lot of resources (10+, according to the benchmark below).

A rough benchmark:

package foo import ( "reflect" "testing" ) type Foo struct{} type Bar struct{} var a = []interface{}{Foo{}, Bar{}} var foo = reflect.TypeOf(Foo{}) var bar = reflect.TypeOf(Bar{}) func BenchmarkTypeOf(b *testing.B) { x, y := 0, 0 for n := 0; n < b.N; n++ { for _, i := range a { switch reflect.TypeOf(i) { case foo: x++ case bar: y++ } } } assertSuccess(b, x, y) } func BenchmarkSwitch(b *testing.B) { x, y := 0, 0 for n := 0; n < b.N; n++ { for _, i := range a { switch i.(type) { case Foo: x++ case Bar: y++ } } } assertSuccess(b, x, y) } func BenchmarkTypeAssertion(b *testing.B) { x, y := 0, 0 for n := 0; n < b.N; n++ { for _, i := range a { if _, isFoo := i.(Foo); isFoo { x++ } else if _, isBar := i.(Bar); isBar { y++ } } } assertSuccess(b, x, y) } func assertSuccess(b *testing.B, x, y int) { if x != y || x != b.N { b.Error("shouldn't happen") } }

yields

BenchmarkTypeOf-8 100000000 20.9 ns/op BenchmarkSwitch-8 1000000000 2.89 ns/op BenchmarkTypeAssertion-8 2000000000 1.89 ns/op

so, happy to re-write as a list of type assertions if you want?

(also, we only do a call to reflect.Type when there's an update to a network (or another service-level resource down the line), so the 10 ns price might be okay?)

oh lord this is over-engineered.

wk8 · 2018-10-09T23:35:00Z

api/api.pb.txt

      type: TYPE_STRING
      json_name: "phpNamespace"
    }
+    field {


This is from the recent protoc upgrade from #2750

OK lets rebase?

There's nothing to rebase against. The protoc upgrade change did not change this file, but it should have: regenerating the proto files after the upgrade will included these lines.

(and sure, I could make it a separate PR, but is it really worth it?)

codecov · 2018-10-10T00:04:26Z

Codecov Report

Merging #2759 into master will increase coverage by 0.08%.
The diff coverage is 77.16%.

@@            Coverage Diff             @@
##           master    #2759      +/-   ##
==========================================
+ Coverage   61.87%   61.95%   +0.08%     
==========================================
  Files         136      137       +1     
  Lines       21926    22047     +121     
==========================================
+ Hits        13566    13660      +94     
- Misses       6896     6912      +16     
- Partials     1464     1475      +11

dperny · 2018-10-11T19:06:01Z

instead of using reflection and writing a more general API, can we write a less-general API that requires some objects be hard-coded? we have absolute control over what components perform resource deallocation, so we can just bake them in.

dperny

I think it's over-engineered. This is what we'd build if we were providing this as a framework or something against arbitrary types, but the types we're actually acting on are constrained to a pretty small set.

api/objects.proto

dperny · 2018-10-11T19:15:28Z

manager/deallocator/deallocator.go

+
+	// for services that are shutting down, we keep track of how many
+	// tasks still exist for them
+	services map[string]*serviceWithTaskCounts


why map service IDs to both the service object AND the count? what does keeping a pointer to the service object get us, as opposed to having a map[string]int?

We do need the service itself at deallocation time. Saves a lookup. I guess we could do away with that, but seems like a pretty minor choice either way?

Sorry I didn't pay attention to this comment before, but I agree with @dperny that it is cleaner to just store service id => count.

My suggestion: when designing data structures, its better to have as reasonably strong reason to do it a certain wait rather than it being fine either way. @wk8

@anshulpundir : isn't saving a lookup a strong enough reason?

One extra lookup for each service delete is not that big a overhead. But I guess you could argue either way. I'll let you make the call.

dperny · 2018-10-11T19:18:13Z

manager/deallocator/deallocator.go

+
+			// now we also need to look at all existing service-level resources
+			// that are marked for deletion
+			for key, resourceDeallocator := range resourceDeallocators {


yeah the more that i think about this... we ought to just do

allNetworks, err = store.FindNetworks(readTx, store.All)

and then later on if we add new resources (i.e. volumes), we'll just add more code:

allVolumes, err := store.FindVolumes(readTx, store.All)

This does lead to duplication, but it's much more clear. If we were ever going to have more than a handful of object types that needed to be managed by this, it would be different, but we almost certainly won't.

manager/deallocator/deallocator.go

dperny · 2018-10-11T19:21:14Z

manager/deallocator/deallocator.go

+// FIXME (jrouge): see the comment on TaskReaper.Stop() and see when to properly stop this
+// plus unit test on this!
+func (deallocator *Deallocator) Stop() {
+	close(deallocator.stopChan)


you want to wrap this in a sync.Once to make calls to Stop idempotent.

I'll do that when I get around to actually using it, right now I'm not clear on this. That's what the comment right before it hints at. But good catch, keeping it in mind :)

manager/deallocator/deallocator.go

dperny · 2018-10-11T19:25:03Z

manager/deallocator/deallocator.go

+}
+
+func (deallocator *Deallocator) processNewEvent(ctx context.Context, event events.Event) (bool, error) {
+	switch typedEvent := event.(type) {


lol typedEvent is a way better name than swtich ev := event.(type) { (which is what we have in other places iirc)

manager/deallocator/deallocator.go

dperny · 2018-10-11T19:33:16Z

manager/deallocator/deallocator_test.go

+		ran <- returnValue
+		close(ran)
+	}()
+	waitForDeallocatorEvent(t, deallocator, expectedUpdates)


i was wondering how you fixed the bug here. i believe you issue an event on notifyEventChan after the setup phase?

a better pattern than doing that would be to add a variable to the deallocator:

type Deallocator struct { // other fields elided started chan struct{} }

Then, after the deallocator has started, you do:

close(started)

Finally, instead of passing expectedUpdates, you should just check if the db state is in the state you want.

Honestly not sold on this one. Adds yet another field for no great reason, and checking whether the deallocator actually performed any DB operations is an additional check on top of just checking that the DB is in the state we expect (which we also do).

dperny · 2018-10-22T15:36:49Z

LGTM now.

anshulpundir · 2018-10-23T21:25:47Z

api/objects.proto

+	// services, as well as all service-level resources, can only be deleted
+	// after all the service's containers have properly shut down
+	// when a user requests a deletion, we just flip this flag
+	// the deallocator will take it from there


It'll probably be helpful to briefly mention how the deallocator uses this flag.

anshulpundir · 2018-10-23T21:26:03Z

api/objects.proto

+
+	// PendingDelete indicates that this service's deletion has been requested
+	// services, as well as all service-level resources, can only be deleted
+	// after all the service's containers have properly shut down


nit: all the => all of the

anshulpundir · 2018-10-23T21:26:24Z

api/objects.proto

+	// PendingDelete indicates that this service's deletion has been requested
+	// services, as well as all service-level resources, can only be deleted
+	// after all the service's containers have properly shut down
+	// when a user requests a deletion, we just flip this flag


Is there a period before this sentence?

anshulpundir · 2018-10-23T21:26:36Z

api/objects.proto

+	// services, as well as all service-level resources, can only be deleted
+	// after all the service's containers have properly shut down
+	// when a user requests a deletion, we just flip this flag
+	// the deallocator will take it from there


same as above.

anshulpundir · 2018-10-23T21:27:00Z

manager/deallocator/deallocator.go

+	"github.com/docker/swarmkit/manager/state/store"
+)
+
+// A Deallocator waits for services to fully shutdown (ie no containers left)


nit: remove 'A' at the beginning?

anshulpundir · 2018-10-23T21:27:58Z

manager/deallocator/deallocator.go

+// user.
+//
+// NOTE: since networks are the only service-level resources as of now,
+// it has been deemed over-engineered to have a clean generic way to


it has been deemed over-engineered to have a clean generic way

yikes!

Removed "clean" ;) didn't mean this comment that way

anshulpundir · 2018-10-23T21:28:51Z

manager/deallocator/deallocator.go

+// handle other types of service-level resources; if we ever start
+// having more of those and thus want to reconsider this choice, it
+// might be worth having a look at this archived branch, that does
+// implements a way of separating the code for the deallocator itself


implements => implement

anshulpundir · 2018-10-23T21:29:50Z

manager/deallocator/deallocator.go

+	doneChan chan struct{}
+}
+
+type serviceWithTaskCounts struct {


Please add a comment.

anshulpundir · 2018-10-24T18:56:42Z

manager/deallocator/deallocator.go

+// Run starts the deallocator, which then starts cleaning up services
+// and their resources when relevant (ie when no tasks still exist
+// for a given service)
+// This is a blocking function


thx for calling this out!

anshulpundir · 2018-10-24T19:00:11Z

manager/deallocator/deallocator.go

+	eventsChan, eventsChanCancel, err := store.ViewAndWatch(deallocator.store,
+		func(readTx store.ReadTx) error {
+			// look for services that are marked for deletion
+			// there's no index on the `PendingDelete` field,


cool. maybe clarify then:
there's no index on the PendingDelete field in the store

anshulpundir · 2018-10-24T19:01:24Z

manager/deallocator/deallocator.go

+		api.EventUpdateNetwork{})
+
+	if err != nil {
+		// if we have an error here, we can't proceed any further, let the world burn


roflol. Perhaps say something less dramatic?

anshulpundir · 2018-10-24T19:01:52Z

manager/deallocator/deallocator.go

+	anyUpdated := false
+	// now let's populate our internal taskCounts
+	for _, service := range allServices {
+		if updated, _ := deallocator.processService(ctx, service); updated {


please add a comment on what processService() does

anshulpundir · 2018-10-24T19:02:55Z

manager/deallocator/deallocator.go

+		}
+	}
+
+	// now we just need to wait for events


what events are we waiting for?

The ones that were listed as args to store.ViewAndWatch just a few lines above? I'm not sure how to say that without being redundant?

anshulpundir · 2018-10-24T19:03:36Z

manager/deallocator/deallocator.go

+	<-deallocator.doneChan
+}
+
+func (deallocator *Deallocator) notifyEventChan(updated bool) {


please add a comment.

anshulpundir · 2018-10-24T19:04:16Z

manager/deallocator/deallocator.go

+	for {
+		select {
+		case event := <-eventsChan:
+			if updated, err := deallocator.processNewEvent(ctx, event); err == nil {


add a comment on what processNewEvent() returns and how its used.

anshulpundir · 2018-10-24T19:04:59Z

manager/deallocator/deallocator.go

+}
+
+func (deallocator *Deallocator) notifyEventChan(updated bool) {
+	if deallocator.eventChan != nil {


should this ever be called where the deallocator.eventChan is nil?

Yes. Added comments.

anshulpundir · 2018-10-24T20:06:18Z

manager/deallocator/deallocator.go

+		// if in doubt, let's proceed to clean up the service anyway
+		// better to clean up resources that shouldn't be cleaned up yet
+		// than ending up with a service and some resources lost in limbo forever
+		return true, deallocator.deallocateService(ctx, service)


I don't think this is a good idea, especially because deletion can't be reversed. Also, I think its better to have a leaked service object than corruption (tasks running without a service object). What do you think?

The whole point of this patch is to start asynchronously deleting services and their resources. That means that if errors occur when actually deleting, we can't just surface those to the user who requested the deletion in the first place, since their request have long since returned.

That leaves us with just a few options in my opinion:

not actually deleting anything when we encounter an error (in which case it would be nice to give the user say a "request ID" they can use to check on the status of a delete they've requested, and also the ability to force a delete after the normal flow has errored out.

always do a best-effort deletion; in which case, sure, we can end up aggressively deleting services and resources that might actually still be in use; but then that's the current behaviour anyway, so it's not worse - and the hope then (as now) is that the system eventually self recovers when tasks eventually do shut down anyway

a refinement on 2 would be adding exponential retries instead, but that does strike me as over-engineered, given the low impact of an anticipated deletion, together with the very low likelihood of that happening in the 1st place (how often do store operations realistically fail, really? At leadership change?)

I see your point with 2 that this behavior is the same as the previous behavior. Looks good as you have it currently.

anshulpundir · 2018-10-24T20:12:06Z

manager/deallocator/deallocator.go

+	return false, nil
+}
+
+func (deallocator *Deallocator) deallocateService(ctx context.Context, service *api.Service) (err error) {


add a comment.

anshulpundir · 2018-10-24T20:13:24Z

manager/deallocator/deallocator.go

+		}
+		for _, networkConfig := range networkConfigs {
+			if network := store.GetNetwork(tx, networkConfig.Target); network != nil {
+				deallocator.processNetwork(ctx, tx, network, ignoreServiceID)


How do you handle the return value from this function, failures etc?

See #2759 (comment)

Should errors be at least logged?

Both processNetwork and processService log errors

anshulpundir · 2018-10-24T20:16:05Z

manager/deallocator/deallocator.go

+
+	// and deallocate networks that may be marked for deletion and aren't used any more
+	for _, network := range allNetworks {
+		if updated, _ := deallocator.processNetwork(ctx, nil, network, nil); updated {


do errors need to be handled or just ignored?

See #2759 (comment)

Should errors be at least logged?

See #2759 (comment)

anshulpundir · 2018-10-24T20:16:26Z

manager/deallocator/deallocator.go

+		return
+	}
+
+	updateFunc := func(t store.Tx) error {


add comments for updateFunc. Also do we really need a function object?

We do. This method can be called either with or without an open transaction, depending on whether it's done as part of a service cleanup or not. See how it's used in the if tx == nil ... else just below.

anshulpundir · 2018-10-24T20:17:29Z

manager/deallocator/deallocator.go

+	return
+}
+
+// proceeds to deallocating a network if it's pending deletion and there no


nit: deallocating => deallocate

anshulpundir · 2018-10-24T20:19:21Z

manager/deallocator/deallocator.go

+
+// proceeds to deallocating a network if it's pending deletion and there no
+// longer are any services using it
+func (deallocator *Deallocator) processNetwork(ctx context.Context, tx store.Tx, network *api.Network, ignoreServiceID *string) (updated bool, err error) {


Please add comments for each of the arguments accepted by the function. e.g. its not obvious what ignoreServiceID does.

anshulpundir · 2018-10-24T20:20:35Z

manager/deallocator/deallocator.go

+	return
+}
+
+func (deallocator *Deallocator) processNewEvent(ctx context.Context, event events.Event) (bool, error) {


plz add a comment. All functions should have comments unless its blatantly obvious.

anshulpundir · 2018-10-25T22:06:25Z

api/objects.proto

+	// PendingDelete indicates that this network's deletion has been requested.
+	// Services, as well as all service-level resources, can only be deleted
+	// after all of the service's containers have properly shut down.
+	// When a user requests a deletion, we just flip this flag


nit:

When a user requests a deletion

When a user requests a deletion of this network

anshulpundir · 2018-10-26T18:33:07Z

manager/deallocator/deallocator.go

+
+	// for services that are shutting down, we keep track of how many
+	// tasks still exist for them
+	services map[string]*serviceWithTaskCounts


Sorry I didn't pay attention to this comment before, but I agree with @dperny that it is cleaner to just store service id => count.

My suggestion: when designing data structures, its better to have as reasonably strong reason to do it a certain wait rather than it being fine either way. @wk8

anshulpundir

few minor comments. Looks good otherwise!

anshulpundir · 2018-10-26T18:47:02Z

manager/deallocator/deallocator.go

+		// if in doubt, let's proceed to clean up the service anyway
+		// better to clean up resources that shouldn't be cleaned up yet
+		// than ending up with a service and some resources lost in limbo forever
+		return true, deallocator.deallocateService(ctx, service)


I see your point with 2 that this behavior is the same as the previous behavior. Looks good as you have it currently.

anshulpundir · 2018-10-26T18:47:52Z

manager/deallocator/deallocator.go

+
+	// and deallocate networks that may be marked for deletion and aren't used any more
+	for _, network := range allNetworks {
+		if updated, _ := deallocator.processNetwork(ctx, nil, network, nil); updated {


Should errors be at least logged?

anshulpundir · 2018-10-26T18:47:57Z

manager/deallocator/deallocator.go

+		}
+		for _, networkConfig := range networkConfigs {
+			if network := store.GetNetwork(tx, networkConfig.Target); network != nil {
+				deallocator.processNetwork(ctx, tx, network, ignoreServiceID)


Should errors be at least logged?

anshulpundir · 2018-10-26T18:50:13Z

manager/deallocator/deallocator.go

+// proceeds to deallocate a network if it's pending deletion and there no
+// longer are any services using it
+// actually deletes the network if it's marked for deletion and no services are
+// using it any more (or the only one using it has ID `ignoreServiceID`, if not


Thanks for adding this, but I don't think its obvious to a first time reader. Why is it OK to deallocated the network when the only one using it has ID ignoreServiceID ?

manager/deallocator/deallocator.go

This patch adds a new manager/deallocator package. The deallocator's job is to take over deletion of services and service-level resources (as of right now, only networks, but e.g. volumes would also fall in the same category. The deallocator relies on the reaper correctly deleting tasks for services that are shutting down, and only when no tasks are left within a service does it then proceed to delete that service and its resources. This patch does not yet actually put that new component to use; that will be done in a future, separate patch, for the sake of easier reviews. Special care has been taken to separate the deallocator itself from each of the service-level resource implementation. Includes unit tests. Signed-off-by: Jean Rouge <[email protected]>

Changes included; - moby/swarmkit#2735 Assign secrets individually to each task - moby/swarmkit#2759 Adding a new `Deallocator` component - moby/swarmkit#2738 Add additional info for secret drivers - moby/swarmkit#2775 Increase grpc max recv message size - addresses moby#37941 - addresses moby#37997 - follow-up to moby#38103 Signed-off-by: Sebastiaan van Stijn <[email protected]>

**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>

Changes included; - moby/swarmkit#2735 Assign secrets individually to each task - moby/swarmkit#2759 Adding a new `Deallocator` component - moby/swarmkit#2738 Add additional info for secret drivers - moby/swarmkit#2775 Increase grpc max recv message size - addresses moby/moby#37941 - addresses moby/moby#37997 - follow-up to moby/moby#38103 Signed-off-by: Sebastiaan van Stijn <[email protected]> Upstream-commit: be3843c8c8fb30b4a604dae9d0dad3d393db717c Component: engine

**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>

Namely the `PendingDelete` fields added to both `Services` and `NetworkResources` in moby/swarmkit#2759 Also amended relevant tests to now wait for services to be fully shut down after removing them. Signed-off-by: Jean Rouge <[email protected]>

**- What I did** This patch actually stars to use the deallocator to clean up services, as well as service-level resources (as of now, only networks). **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. **- How to test it** Updated tests. **- Description for the changelog** Services & networks are no longer deleted immediately when a user requests their deletion; instead, they are deleted when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>

**- What I did** This patch allows users to choose to make removals of services and networks asynchronous, which means that they only get deleted once all the containers belonging to them or using them are safely shut down. **- How I did it** A previous patch (moby#2759) introduced a new component, the deallocator, responsible for cleaning up services and service-level resources. This patch is actually starting to make use of that component. Since the deallocator used to rely on the reaper deleting the tasks belonging to services that had been marked for removal, a previous version of this patch was modifying the task reaper quite heavily to also keep track of such services (needed since otherwise the reaper would fail to clean up all of them, instead keeping some for history tracking purposes). However, it soon appeared that this was not the best approach: * this creates a hidden coupling between the reaper and the deallocator * it's also not the best user experience to suddenly remove all of a service's task history while shutting down, for not apparent reason to the user For these reasons, this patch instead amends the deallocator to also look at tasks status when keeping track of how many alive tasks remain to a service. Part of this test was already reviewed in moby#2778, which subsequently got reverted following the discussion in moby/moby#38525. The main difference between this PR and moby#2778 is that asynchronous removal is an opt-in behaviour here instead of changing the existing behaviour. **- How to test it** Updated tests. **- Description for the changelog** Services & networks can be deleted asynchronously, in which case the request for their removal simply marks them for removal, and they are then actually deleted only when all their tasks are actually shut down. Signed-off-by: Jean Rouge <[email protected]>

Changes included; - moby/swarmkit#2735 Assign secrets individually to each task - moby/swarmkit#2759 Adding a new `Deallocator` component - moby/swarmkit#2738 Add additional info for secret drivers - moby/swarmkit#2775 Increase grpc max recv message size - addresses moby#37941 - addresses moby#37997 - follow-up to moby#38103 Signed-off-by: Sebastiaan van Stijn <[email protected]>

GordonTheTurtle added the dco/no label Oct 4, 2018

wk8 force-pushed the wk8/network_rm branch from 6d67001 to a5be143 Compare October 4, 2018 00:52

GordonTheTurtle removed the dco/no label Oct 4, 2018

wk8 force-pushed the wk8/network_rm branch from a5be143 to be16429 Compare October 4, 2018 01:05

GordonTheTurtle added the dco/no label Oct 4, 2018

anshulpundir reviewed Oct 5, 2018

View reviewed changes

wk8 force-pushed the wk8/network_rm branch from b5ed82e to 818a562 Compare October 6, 2018 00:30

anshulpundir reviewed Oct 8, 2018

View reviewed changes

wk8 changed the title ~~Wk8/network rm~~ Adding a new Deallocator component Oct 9, 2018

wk8 commented Oct 9, 2018

View reviewed changes

dperny requested changes Oct 11, 2018

View reviewed changes

wk8 force-pushed the wk8/network_rm branch 2 times, most recently from 8a5e572 to b02a280 Compare October 17, 2018 19:44

GordonTheTurtle removed the dco/no label Oct 17, 2018

wk8 force-pushed the wk8/network_rm branch from 9ee6960 to f9575cc Compare October 22, 2018 15:01

dperny approved these changes Oct 22, 2018

View reviewed changes

anshulpundir reviewed Oct 23, 2018

View reviewed changes

anshulpundir reviewed Oct 24, 2018

View reviewed changes

anshulpundir suggested changes Oct 24, 2018

View reviewed changes

GordonTheTurtle added the dco/no label Oct 25, 2018

anshulpundir reviewed Oct 26, 2018

View reviewed changes

wk8 force-pushed the wk8/network_rm branch from dae8ce7 to 9aef919 Compare October 29, 2018 00:00

GordonTheTurtle removed the dco/no label Oct 29, 2018

anshulpundir approved these changes Oct 30, 2018

View reviewed changes

wk8 force-pushed the wk8/network_rm branch from 9aef919 to 3d6d352 Compare October 30, 2018 17:48

wk8 force-pushed the wk8/network_rm branch 2 times, most recently from f731bf7 to b828d81 Compare October 30, 2018 19:53

wk8 force-pushed the wk8/network_rm branch from b828d81 to 1ecf4c6 Compare October 30, 2018 20:32

anshulpundir merged commit 46cb7b3 into moby:master Oct 30, 2018

thaJeztah mentioned this pull request Nov 1, 2018

Bump SwarmKit to 8d8689d5a94ac42406883a4cef89b3a5eaec3d11 moby/moby#38123

Merged

wk8 mentioned this pull request Nov 5, 2018

Actually starting to use the deallocator to clean up services #2778

Merged

wk8 mentioned this pull request Feb 1, 2019

Actually starting to use the deallocator to clean up services #2814

Open

Adding a new Deallocator component #2759

Adding a new Deallocator component #2759

Uh oh!

Conversation

wk8 commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GordonTheTurtle commented Oct 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anshulpundir Oct 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wk8 Oct 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dperny commented Oct 11, 2018

Uh oh!

dperny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Adding a new `Deallocator` component #2759

Adding a new `Deallocator` component #2759

wk8 commented Oct 4, 2018 •

edited

Loading

anshulpundir Oct 5, 2018 •

edited

Loading

wk8 Oct 9, 2018 •

edited

Loading

codecov bot commented Oct 10, 2018 •

edited

Loading

anshulpundir Oct 30, 2018 •

edited

Loading