Skip to content

Conversation

@toddbaert
Copy link
Member

@toddbaert toddbaert commented Jul 10, 2025

This is a change that I suspect will have little practical impact for most users, but helps disambiguate some behavior that recently caused issues and confusion.

This is only one means of addressing this concern, so please don't hesitate to put forward alternative proposals.

"Idempotent" may not be exactly the right word here, since it's only idempotent within the scope of one execution of the life-cycle, so I'm open to copy changes here as well.

EDIT

⚠️ I've substantially changed this PR based on feedback, and dismissed all approvals:

As @erka and others have pointed out, practically, shutdown has been used for more than just cleaning up providers, it's also used frequently to "reset" the API for testing purposes. I think this is a valid use-case and I don't see any reason why it can't be added to what (at least my understanding of) the original intent of the function was... so I've changed 1.6.2 to say that the shutdown now also resets the state of the API fully (removes hooks, providers, event handlers, etc) from the API.

As @chrfwow noted (and I was concerned about as well) guaranteeing that shutdown is not called while a provider is still starting up will be tricky to implement. Instead I've changed 1.6.1 to say we will unconditionally run shutdown on all providers, and also added a recommendation that providers handle this gracefully in the provider spec.

@sahidvelji I've also added a pre-amble about shutdown in general as you requested.

@lukas-reining @erka @beeme1mr @sahidvelji @chrfwow please re-review.

@toddbaert toddbaert requested a review from a team as a code owner July 10, 2025 15:41
@sahidvelji
Copy link
Contributor

sahidvelji commented Jul 10, 2025

Should we also add a blurb about what exactly the SDK should do when the API's Shutdown function is invoked?
It looks like some SDKs clear the entire internal state of providers (effectively unregistering all providers). Is that what is expected, or do we just want implementations to transition the provider status to NOT_READY? See open-feature/go-sdk#400 (comment)

In other words, what is the scope of Shutdown? Is it meant to just call each provider's Shutdown method and transition states, or do we expect Shutdown to clear all internal state?

beeme1mr
beeme1mr previously approved these changes Jul 10, 2025
@toddbaert
Copy link
Member Author

toddbaert commented Jul 10, 2025

Should we also add a blurb about what exactly the SDK should do when the API's Shutdown function is invoked? It looks like some SDKs clear the entire internal state of providers (effectively unregistering all providers). Is that what is expected, or do we just want implementations to transition the provider status to NOT_READY? See open-feature/go-sdk#400 (comment)

We could make this another point if everyone agrees, but I'm a bit skeptical of the value...

Shutdown is meant to be called on application exit, so the state of the API isn't really that important anymore since the entire thing is about to be moved out of memory. The key part with this change was understanding guarantees around the providers' shutdown, which is relevant for finalizing things, disconnecting, flushing telemetry, etc.

Can you provide a reason anyone would really care about the state after shutdown, besides for our own internal testing?

In other words, what is the scope of Shutdown? Is it meant to just call each provider's Shutdown method and transition states, or do we expect Shutdown to clear all internal state?

We can add a non-normative (descriptive) preamble to the shutdown section. I don't want to start enumerating all the things shutdown will not do (for example, I don't think it should clear all state, such as hook registration, but TBH I don't care that much either because of my point above), that seems like it could be practically infinite.

@erka
Copy link
Member

erka commented Jul 10, 2025

Shutdown is meant to be called on application exit

This aspect is not covered in the specification. In practice, some developers call the global API shutdown after each their test, expecting a clean state before next test run.

Copy link
Member

@lukas-reining lukas-reining left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change.
Left 2 questions :)

> The API's `shutdown` function **MUST NOT** call shutdown on providers which are in state `NOT_READY`.
With respect to the lifecycle of a given active provider, the global API object's `shutdown` function should be idempotent; multiple calls to `shutdown` should result in only a single execution of the provider's `shutdown` function.
Implementations should take care to await or cancel the initialization of providers if `shutdown` is called while providers are still being initialized, and have yet to transition to `READY` or some other state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this conflicts with the normative section.
If shutdown is called during initialization, from the normative part only, I would expect the provider never to be shut down.

Especially, as an initializing provider is in NOT_READY state.
Or do I read that wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I really struggle to come up with something that solves what I just said, but still does not result in providers that never shut down because during shut down they were initializing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining using locks or other such mechanisms to make sure an init completed (or failed) before shutdown was called.

Copy link
Member Author

@toddbaert toddbaert Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add... I think this will add some implementation complexity, and we also don't have any recommendation that providers have a timeout (I think we could consider a SHOULD point for this, for providers which take time to start up).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukas-reining I've substantially changed the PR as mentioned in the description. Please re-review.

@lukas-reining
Copy link
Member

Should we also add a blurb about what exactly the SDK should do when the API's Shutdown function is invoked? It looks like some SDKs clear the entire internal state of providers (effectively unregistering all providers). Is that what is expected, or do we just want implementations to transition the provider status to NOT_READY? See open-feature/go-sdk#400 (comment)

We could make this another point if everyone agrees, but I'm a bit skeptical of the value...

Mh, I think clearing the providers after the SDK has been shutdown could be a good thing to not have undefined behavior. @toddbaert
For tests or special ways of using the SDK like in some serverless envs it could be good to know what to expect from the SDK.

@toddbaert
Copy link
Member Author

toddbaert commented Jul 10, 2025

Shutdown is meant to be called on application exit

This aspect is not covered in the specification. In practice, some developers call the global API shutdown after each their test, expecting a clean state before next test run.

Mh, I think clearing the providers after the SDK has been shutdown could be a good thing to not have undefined behavior. @toddbaert
For tests or special ways of using the SDK like in some serverless envs it could be good to know what to expect from the SDK.

@erka @lukas-reining Hmmm... should we detatch all event listeners and hooks as well? This starts to look to me more like a cleanup testing utility and less like a way to manage the lifecycle of the providers, which was the original intent at least on my end (shutdown was introduced along with the initialization/shutdown of providers).

Comment on lines 415 to 416
With respect to the lifecycle of a given active provider, the global API object's `shutdown` function should be idempotent; multiple calls to `shutdown` should result in only a single execution of the provider's `shutdown` function, even if the provider has not yet transitioned into `NOT_READY` state.
Implementations should take care to await or cancel the initialization of providers if `shutdown` is called while providers are still being initialized, and have yet to transition to `READY` or some other state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me as if implementing this without a race condition somewhere and without the possibility to call shutdown twice on a provider will be very tricky. I think it would be easier to require providers to be able to handle multiple calls to shutdown.

Copy link
Member Author

@toddbaert toddbaert Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been worried about this as well. I was looking into some implementations, and I can see a few where it wouldn't be so bad, and other where it would be difficult... I know you have a knack for concurrency so I think I'll make a change if you also have this concern.

I think maybe we could simplify it to say that we call shutdown on all providers in state NOT_READY, but don't make other guarantees (so it's possible to get re-entrant calls to shutdown, but once shutdown is complete there should be no more).

Actually, I think we have another problem if we only call shutdown on NOT_READY providers but don't await all startu-ps... if a provider is starting up, but hasn't yet started, it's in NOT_READY (and so never receives a shutdown) but then after shutdown it might become READY, leaving us in a somewhat troublesome state.

I think if we can't guarantee that the shutdown won't be called re-entrantly, we have to run it for all providers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrfwow please see my edit to the description, and re-review.

Copy link

@bduffany bduffany Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who is not familiar with the openfeature spec, but has worked with the Go SDK a little bit, and has experience with concurrency as well as API design, this is how I wish things could behave:

  • Global API shutdown: this function should call shutdown on all registered providers, regardless of their state. It should synchronously block until shutdown is complete. It should clean up all state associated with the global singleton. This includes any references to the provider objects, global hooks, event listeners, etc. It should be idempotent - so if no providers / hooks etc are registered, it just does nothing.

    • This is adding more responsibilities to shutdown of the global API singleton, but the benefit is that it provides a mechanism for cleaning up all global state associated with the global API object, which provides maximum flexibility for applications, and in particular, improves testability.
  • Provider shutdown: this function should terminate the provider and clean up any resources associated with the provider. It should not care about the state that the provider is in. It should synchronously block until all resources are freed. (If callers prefer async shutdown, they are free to start their own threads etc. to do so, but providers themselves should shut down synchronously.) Provider shutdown should not have to be idempotent. Idempotency sometimes requires increased complexity (for example, in golang, channels cannot be closed twice). So requiring every provider to be idempotent shifts more burden and complexity onto providers, for no benefit (since the global API object is already keeping track of whether it has called shutdown or not).

Maybe the behavior I'm describing could be instead be called something like "destroy" or "cleanup" but I think it would add confusion to have two similarly named methods (both shutdown and cleanup).

Side note: I believe the current issues at least with the Go SDK mostly stem from the global singleton design. If the API object were not global, and users could instead create their own instances and manage the lifecycle themselves, this problem wouldn't exist. IMO it would be strictly better to allow multiple instances rather than having a singleton. Applications would be free to create a global singleton if they wanted, but that should be up to the application, and not enforced by the spec.

Copy link
Member Author

@toddbaert toddbaert Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bduffany I think what you've proposed is basically reflected in the PR after my recent update just now, with the exception of the recommendation (RFC 2119 "SHOULD") that provider shutdown be idempotent. I'm open to dropping this recommendation but I don't think it hurts, and again it is only a recommendation.

Side note: I believe the current issues at least with the Go SDK mostly stem from the global singleton design. If the API object were not global, and users could instead create their own instances and manage the lifecycle themselves, this problem wouldn't exist.

Yes. This is probably true, and is true of a few parts of the API. The global singleton presents some challenges, but similar to the OTel global singleton, also is basically the only way to allow for some powerful 3rd party integrations; it was a design trade-off early in the project that we can't go back on now without a very substantial redesign that would compromise features.

Copy link

@bduffany bduffany Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to dropping this recommendation but I don't think it hurts, and again it is only a recommendation.

I think it might hurt in the sense that it increases the burden on every provider (despite being a recommendation, providers probably will strive to implement all of the recommendations). Also it increases the overall size of the spec, without adding any benefit, since the global API singleton is already guaranteeing idempotence if I'm understanding correctly.

Copy link
Member Author

@toddbaert toddbaert Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the global API singleton is already guaranteeing idempotence if I'm understanding correctly

Well, since my changes earlier, we aren't guaranteeing this. The more implementations I looked at, the more I became convinced this would be hard ™️ to absolutely guarantee, as @chrfwow agreed here, without a commensurate benefit. I think as written, because the provider is removed after shutdown, it's extremely unlikely, but not guaranteed that a provider might never have it's shutdown called twice.

@toddbaert toddbaert requested a review from beeme1mr July 11, 2025 14:11
@toddbaert toddbaert dismissed beeme1mr’s stale review July 11, 2025 14:12

I've changed the PR substantially.

@toddbaert toddbaert changed the title feat: add point 1.6.2 about shutdown idempotence feat: add point 1.6.2, 2.5.3 about shutdown details Jul 11, 2025
[![experimental](https://img.shields.io/static/v1?label=Status&message=experimental&color=orange)](https://github.com/open-feature/spec/tree/main/specification#experimental)

The API's `shutdown` function defines a means of graceful shutdown, calling the `shutdown` function on all providers, allowing them to flush telemetry, clean up connections, and release any relevant resources.
It also provides a means of resetting the API object to its default state, removing all hooks, event handlers, and providers; this is useful for testing purposes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have separate APIs to do this in JS, but I think it makes sense for it to be included as part of the shutdown process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly mention that the noop provider should be the default?

Copy link
Member Author

@toddbaert toddbaert Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have separate APIs to do this in JS, but I think it makes sense for it to be included as part of the shutdown process.

Ya I agree.

Should we explicitly mention that the noop provider should be the default?

The noop provider is an implementation detail not mentioned in any normative part of the spec, AFAIK. We only mention that things should "no-op", but we can mention something like that and I think implementations will get the point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mention the no-op provider in this non-normative section so I'm OK with mentioning it in a non-normative section here as well. Will add.

@toddbaert toddbaert requested a review from bduffany July 11, 2025 14:53
@toddbaert toddbaert requested a review from beeme1mr July 11, 2025 15:09
Copy link
Member

@lukas-reining lukas-reining left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, I am only still not sure about the provider idempotency problem.

Co-authored-by: Sahid Velji <[email protected]>
Signed-off-by: Todd Baert <[email protected]>
@toddbaert toddbaert requested review from chrfwow and sahidvelji July 15, 2025 16:22
@toddbaert
Copy link
Member Author

Looks like we have a consensus here. Thanks everyone!

@toddbaert toddbaert merged commit 2d4b27b into main Jul 16, 2025
7 checks passed
@toddbaert toddbaert deleted the feat/add-162 branch July 16, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants