-
Notifications
You must be signed in to change notification settings - Fork 49
feat: add point 1.6.2, 2.5.3 about shutdown details #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Todd Baert <[email protected]>
|
Should we also add a blurb about what exactly the SDK should do when the API's Shutdown function is invoked? In other words, what is the scope of Shutdown? Is it meant to just call each provider's Shutdown method and transition states, or do we expect Shutdown to clear all internal state? |
We could make this another point if everyone agrees, but I'm a bit skeptical of the value... Shutdown is meant to be called on application exit, so the state of the API isn't really that important anymore since the entire thing is about to be moved out of memory. The key part with this change was understanding guarantees around the providers' shutdown, which is relevant for finalizing things, disconnecting, flushing telemetry, etc. Can you provide a reason anyone would really care about the state after shutdown, besides for our own internal testing?
We can add a non-normative (descriptive) preamble to the shutdown section. I don't want to start enumerating all the things shutdown will not do (for example, I don't think it should clear all state, such as hook registration, but TBH I don't care that much either because of my point above), that seems like it could be practically infinite. |
This aspect is not covered in the specification. In practice, some developers call the global API shutdown after each their test, expecting a clean state before next test run. |
lukas-reining
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change.
Left 2 questions :)
| > The API's `shutdown` function **MUST NOT** call shutdown on providers which are in state `NOT_READY`. | ||
| With respect to the lifecycle of a given active provider, the global API object's `shutdown` function should be idempotent; multiple calls to `shutdown` should result in only a single execution of the provider's `shutdown` function. | ||
| Implementations should take care to await or cancel the initialization of providers if `shutdown` is called while providers are still being initialized, and have yet to transition to `READY` or some other state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this conflicts with the normative section.
If shutdown is called during initialization, from the normative part only, I would expect the provider never to be shut down.
Especially, as an initializing provider is in NOT_READY state.
Or do I read that wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I really struggle to come up with something that solves what I just said, but still does not result in providers that never shut down because during shut down they were initializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining using locks or other such mechanisms to make sure an init completed (or failed) before shutdown was called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add... I think this will add some implementation complexity, and we also don't have any recommendation that providers have a timeout (I think we could consider a SHOULD point for this, for providers which take time to start up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukas-reining I've substantially changed the PR as mentioned in the description. Please re-review.
Mh, I think clearing the providers after the SDK has been shutdown could be a good thing to not have undefined behavior. @toddbaert |
@erka @lukas-reining Hmmm... should we detatch all event listeners and hooks as well? This starts to look to me more like a cleanup testing utility and less like a way to manage the lifecycle of the providers, which was the original intent at least on my end ( |
| With respect to the lifecycle of a given active provider, the global API object's `shutdown` function should be idempotent; multiple calls to `shutdown` should result in only a single execution of the provider's `shutdown` function, even if the provider has not yet transitioned into `NOT_READY` state. | ||
| Implementations should take care to await or cancel the initialization of providers if `shutdown` is called while providers are still being initialized, and have yet to transition to `READY` or some other state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me as if implementing this without a race condition somewhere and without the possibility to call shutdown twice on a provider will be very tricky. I think it would be easier to require providers to be able to handle multiple calls to shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been worried about this as well. I was looking into some implementations, and I can see a few where it wouldn't be so bad, and other where it would be difficult... I know you have a knack for concurrency so I think I'll make a change if you also have this concern.
I think maybe we could simplify it to say that we call shutdown on all providers in state NOT_READY, but don't make other guarantees (so it's possible to get re-entrant calls to shutdown, but once shutdown is complete there should be no more).
Actually, I think we have another problem if we only call shutdown on NOT_READY providers but don't await all startu-ps... if a provider is starting up, but hasn't yet started, it's in NOT_READY (and so never receives a shutdown) but then after shutdown it might become READY, leaving us in a somewhat troublesome state.
I think if we can't guarantee that the shutdown won't be called re-entrantly, we have to run it for all providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrfwow please see my edit to the description, and re-review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone who is not familiar with the openfeature spec, but has worked with the Go SDK a little bit, and has experience with concurrency as well as API design, this is how I wish things could behave:
-
Global API
shutdown: this function should callshutdownon all registered providers, regardless of their state. It should synchronously block until shutdown is complete. It should clean up all state associated with the global singleton. This includes any references to the provider objects, global hooks, event listeners, etc. It should be idempotent - so if no providers / hooks etc are registered, it just does nothing.- This is adding more responsibilities to
shutdownof the global API singleton, but the benefit is that it provides a mechanism for cleaning up all global state associated with the global API object, which provides maximum flexibility for applications, and in particular, improves testability.
- This is adding more responsibilities to
-
Provider
shutdown: this function should terminate the provider and clean up any resources associated with the provider. It should not care about the state that the provider is in. It should synchronously block until all resources are freed. (If callers prefer async shutdown, they are free to start their own threads etc. to do so, but providers themselves should shut down synchronously.) Provider shutdown should not have to be idempotent. Idempotency sometimes requires increased complexity (for example, in golang, channels cannot be closed twice). So requiring every provider to be idempotent shifts more burden and complexity onto providers, for no benefit (since the global API object is already keeping track of whether it has called shutdown or not).
Maybe the behavior I'm describing could be instead be called something like "destroy" or "cleanup" but I think it would add confusion to have two similarly named methods (both shutdown and cleanup).
Side note: I believe the current issues at least with the Go SDK mostly stem from the global singleton design. If the API object were not global, and users could instead create their own instances and manage the lifecycle themselves, this problem wouldn't exist. IMO it would be strictly better to allow multiple instances rather than having a singleton. Applications would be free to create a global singleton if they wanted, but that should be up to the application, and not enforced by the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bduffany I think what you've proposed is basically reflected in the PR after my recent update just now, with the exception of the recommendation (RFC 2119 "SHOULD") that provider shutdown be idempotent. I'm open to dropping this recommendation but I don't think it hurts, and again it is only a recommendation.
Side note: I believe the current issues at least with the Go SDK mostly stem from the global singleton design. If the API object were not global, and users could instead create their own instances and manage the lifecycle themselves, this problem wouldn't exist.
Yes. This is probably true, and is true of a few parts of the API. The global singleton presents some challenges, but similar to the OTel global singleton, also is basically the only way to allow for some powerful 3rd party integrations; it was a design trade-off early in the project that we can't go back on now without a very substantial redesign that would compromise features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to dropping this recommendation but I don't think it hurts, and again it is only a recommendation.
I think it might hurt in the sense that it increases the burden on every provider (despite being a recommendation, providers probably will strive to implement all of the recommendations). Also it increases the overall size of the spec, without adding any benefit, since the global API singleton is already guaranteeing idempotence if I'm understanding correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the global API singleton is already guaranteeing idempotence if I'm understanding correctly
Well, since my changes earlier, we aren't guaranteeing this. The more implementations I looked at, the more I became convinced this would be hard ™️ to absolutely guarantee, as @chrfwow agreed here, without a commensurate benefit. I think as written, because the provider is removed after shutdown, it's extremely unlikely, but not guaranteed that a provider might never have it's shutdown called twice.
Signed-off-by: Todd Baert <[email protected]>
| [](https://github.com/open-feature/spec/tree/main/specification#experimental) | ||
|
|
||
| The API's `shutdown` function defines a means of graceful shutdown, calling the `shutdown` function on all providers, allowing them to flush telemetry, clean up connections, and release any relevant resources. | ||
| It also provides a means of resetting the API object to its default state, removing all hooks, event handlers, and providers; this is useful for testing purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have separate APIs to do this in JS, but I think it makes sense for it to be included as part of the shutdown process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly mention that the noop provider should be the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have separate APIs to do this in JS, but I think it makes sense for it to be included as part of the shutdown process.
Ya I agree.
Should we explicitly mention that the noop provider should be the default?
The noop provider is an implementation detail not mentioned in any normative part of the spec, AFAIK. We only mention that things should "no-op", but we can mention something like that and I think implementations will get the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mention the no-op provider in this non-normative section so I'm OK with mentioning it in a non-normative section here as well. Will add.
Signed-off-by: Todd Baert <[email protected]>
lukas-reining
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, I am only still not sure about the provider idempotency problem.
Co-authored-by: Sahid Velji <[email protected]> Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
|
Looks like we have a consensus here. Thanks everyone! |
This is a change that I suspect will have little practical impact for most users, but helps disambiguate some behavior that recently caused issues and confusion.
This is only one means of addressing this concern, so please don't hesitate to put forward alternative proposals.
"Idempotent" may not be exactly the right word here, since it's only idempotent within the scope of one execution of the life-cycle, so I'm open to copy changes here as well.
EDIT
As @erka and others have pointed out, practically,
shutdownhas been used for more than just cleaning up providers, it's also used frequently to "reset" the API for testing purposes. I think this is a valid use-case and I don't see any reason why it can't be added to what (at least my understanding of) the original intent of the function was... so I've changed 1.6.2 to say that theshutdownnow also resets the state of the API fully (removes hooks, providers, event handlers, etc) from the API.As @chrfwow noted (and I was concerned about as well) guaranteeing that
shutdownis not called while a provider is still starting up will be tricky to implement. Instead I've changed 1.6.1 to say we will unconditionally run shutdown on all providers, and also added a recommendation that providers handle this gracefully in the provider spec.@sahidvelji I've also added a pre-amble about shutdown in general as you requested.
@lukas-reining @erka @beeme1mr @sahidvelji @chrfwow please re-review.