feat: add point 1.6.2, 2.5.3 about shutdown details #323

toddbaert · 2025-07-10T15:41:15Z

This is a change that I suspect will have little practical impact for most users, but helps disambiguate some behavior that recently caused issues and confusion.

This is only one means of addressing this concern, so please don't hesitate to put forward alternative proposals.

"Idempotent" may not be exactly the right word here, since it's only idempotent within the scope of one execution of the life-cycle, so I'm open to copy changes here as well.

EDIT

⚠️ I've substantially changed this PR based on feedback, and dismissed all approvals:

As @erka and others have pointed out, practically, shutdown has been used for more than just cleaning up providers, it's also used frequently to "reset" the API for testing purposes. I think this is a valid use-case and I don't see any reason why it can't be added to what (at least my understanding of) the original intent of the function was... so I've changed 1.6.2 to say that the shutdown now also resets the state of the API fully (removes hooks, providers, event handlers, etc) from the API.

As @chrfwow noted (and I was concerned about as well) guaranteeing that shutdown is not called while a provider is still starting up will be tricky to implement. Instead I've changed 1.6.1 to say we will unconditionally run shutdown on all providers, and also added a recommendation that providers handle this gracefully in the provider spec.

@sahidvelji I've also added a pre-amble about shutdown in general as you requested.

@lukas-reining @erka @beeme1mr @sahidvelji @chrfwow please re-review.

Signed-off-by: Todd Baert <[email protected]>

specification/sections/01-flag-evaluation.md

sahidvelji · 2025-07-10T15:58:29Z

Should we also add a blurb about what exactly the SDK should do when the API's Shutdown function is invoked?
It looks like some SDKs clear the entire internal state of providers (effectively unregistering all providers). Is that what is expected, or do we just want implementations to transition the provider status to NOT_READY? See open-feature/go-sdk#400 (comment)

In other words, what is the scope of Shutdown? Is it meant to just call each provider's Shutdown method and transition states, or do we expect Shutdown to clear all internal state?

toddbaert · 2025-07-10T16:24:50Z

Should we also add a blurb about what exactly the SDK should do when the API's Shutdown function is invoked? It looks like some SDKs clear the entire internal state of providers (effectively unregistering all providers). Is that what is expected, or do we just want implementations to transition the provider status to NOT_READY? See open-feature/go-sdk#400 (comment)

We could make this another point if everyone agrees, but I'm a bit skeptical of the value...

Shutdown is meant to be called on application exit, so the state of the API isn't really that important anymore since the entire thing is about to be moved out of memory. The key part with this change was understanding guarantees around the providers' shutdown, which is relevant for finalizing things, disconnecting, flushing telemetry, etc.

Can you provide a reason anyone would really care about the state after shutdown, besides for our own internal testing?

In other words, what is the scope of Shutdown? Is it meant to just call each provider's Shutdown method and transition states, or do we expect Shutdown to clear all internal state?

We can add a non-normative (descriptive) preamble to the shutdown section. I don't want to start enumerating all the things shutdown will not do (for example, I don't think it should clear all state, such as hook registration, but TBH I don't care that much either because of my point above), that seems like it could be practically infinite.

erka · 2025-07-10T17:54:42Z

Shutdown is meant to be called on application exit

This aspect is not covered in the specification. In practice, some developers call the global API shutdown after each their test, expecting a clean state before next test run.

lukas-reining

I like the change.
Left 2 questions :)

lukas-reining · 2025-07-10T19:52:55Z

specification/sections/01-flag-evaluation.md

+> The API's `shutdown` function **MUST NOT** call shutdown on providers which are in state `NOT_READY`.
+
+With respect to the lifecycle of a given active provider, the global API object's `shutdown` function should be idempotent; multiple calls to `shutdown` should result in only a single execution of the provider's `shutdown` function.
+Implementations should take care to await or cancel the initialization of providers if `shutdown` is called while providers are still being initialized, and have yet to transition to `READY` or some other state.


To me, this conflicts with the normative section.
If shutdown is called during initialization, from the normative part only, I would expect the provider never to be shut down.

Especially, as an initializing provider is in NOT_READY state.
Or do I read that wrong?

But I really struggle to come up with something that solves what I just said, but still does not result in providers that never shut down because during shut down they were initializing.

I was imagining using locks or other such mechanisms to make sure an init completed (or failed) before shutdown was called.

I should add... I think this will add some implementation complexity, and we also don't have any recommendation that providers have a timeout (I think we could consider a SHOULD point for this, for providers which take time to start up).

@lukas-reining I've substantially changed the PR as mentioned in the description. Please re-review.

specification/sections/01-flag-evaluation.md

lukas-reining · 2025-07-10T20:05:38Z

Should we also add a blurb about what exactly the SDK should do when the API's Shutdown function is invoked? It looks like some SDKs clear the entire internal state of providers (effectively unregistering all providers). Is that what is expected, or do we just want implementations to transition the provider status to NOT_READY? See open-feature/go-sdk#400 (comment)

We could make this another point if everyone agrees, but I'm a bit skeptical of the value...

Mh, I think clearing the providers after the SDK has been shutdown could be a good thing to not have undefined behavior. @toddbaert
For tests or special ways of using the SDK like in some serverless envs it could be good to know what to expect from the SDK.

toddbaert · 2025-07-10T23:53:24Z

Shutdown is meant to be called on application exit

This aspect is not covered in the specification. In practice, some developers call the global API shutdown after each their test, expecting a clean state before next test run.

Mh, I think clearing the providers after the SDK has been shutdown could be a good thing to not have undefined behavior. @toddbaert
For tests or special ways of using the SDK like in some serverless envs it could be good to know what to expect from the SDK.

@erka @lukas-reining Hmmm... should we detatch all event listeners and hooks as well? This starts to look to me more like a cleanup testing utility and less like a way to manage the lifecycle of the providers, which was the original intent at least on my end (shutdown was introduced along with the initialization/shutdown of providers).

chrfwow · 2025-07-11T05:29:13Z

specification/sections/01-flag-evaluation.md

+With respect to the lifecycle of a given active provider, the global API object's `shutdown` function should be idempotent; multiple calls to `shutdown` should result in only a single execution of the provider's `shutdown` function, even if the provider has not yet transitioned into `NOT_READY` state.
+Implementations should take care to await or cancel the initialization of providers if `shutdown` is called while providers are still being initialized, and have yet to transition to `READY` or some other state.


It seems to me as if implementing this without a race condition somewhere and without the possibility to call shutdown twice on a provider will be very tricky. I think it would be easier to require providers to be able to handle multiple calls to shutdown.

I have been worried about this as well. I was looking into some implementations, and I can see a few where it wouldn't be so bad, and other where it would be difficult... I know you have a knack for concurrency so I think I'll make a change if you also have this concern.

I think maybe we could simplify it to say that we call shutdown on all providers in state NOT_READY, but don't make other guarantees (so it's possible to get re-entrant calls to shutdown, but once shutdown is complete there should be no more).

Actually, I think we have another problem if we only call shutdown on NOT_READY providers but don't await all startu-ps... if a provider is starting up, but hasn't yet started, it's in NOT_READY (and so never receives a shutdown) but then after shutdown it might become READY, leaving us in a somewhat troublesome state.

I think if we can't guarantee that the shutdown won't be called re-entrantly, we have to run it for all providers.

@chrfwow please see my edit to the description, and re-review.

As someone who is not familiar with the openfeature spec, but has worked with the Go SDK a little bit, and has experience with concurrency as well as API design, this is how I wish things could behave:

Global API shutdown: this function should call shutdown on all registered providers, regardless of their state. It should synchronously block until shutdown is complete. It should clean up all state associated with the global singleton. This includes any references to the provider objects, global hooks, event listeners, etc. It should be idempotent - so if no providers / hooks etc are registered, it just does nothing.

This is adding more responsibilities to shutdown of the global API singleton, but the benefit is that it provides a mechanism for cleaning up all global state associated with the global API object, which provides maximum flexibility for applications, and in particular, improves testability.

Provider shutdown: this function should terminate the provider and clean up any resources associated with the provider. It should not care about the state that the provider is in. It should synchronously block until all resources are freed. (If callers prefer async shutdown, they are free to start their own threads etc. to do so, but providers themselves should shut down synchronously.) Provider shutdown should not have to be idempotent. Idempotency sometimes requires increased complexity (for example, in golang, channels cannot be closed twice). So requiring every provider to be idempotent shifts more burden and complexity onto providers, for no benefit (since the global API object is already keeping track of whether it has called shutdown or not).

Maybe the behavior I'm describing could be instead be called something like "destroy" or "cleanup" but I think it would add confusion to have two similarly named methods (both shutdown and cleanup).

Side note: I believe the current issues at least with the Go SDK mostly stem from the global singleton design. If the API object were not global, and users could instead create their own instances and manage the lifecycle themselves, this problem wouldn't exist. IMO it would be strictly better to allow multiple instances rather than having a singleton. Applications would be free to create a global singleton if they wanted, but that should be up to the application, and not enforced by the spec.

@bduffany I think what you've proposed is basically reflected in the PR after my recent update just now, with the exception of the recommendation (RFC 2119 "SHOULD") that provider shutdown be idempotent. I'm open to dropping this recommendation but I don't think it hurts, and again it is only a recommendation.

Side note: I believe the current issues at least with the Go SDK mostly stem from the global singleton design. If the API object were not global, and users could instead create their own instances and manage the lifecycle themselves, this problem wouldn't exist.

Yes. This is probably true, and is true of a few parts of the API. The global singleton presents some challenges, but similar to the OTel global singleton, also is basically the only way to allow for some powerful 3rd party integrations; it was a design trade-off early in the project that we can't go back on now without a very substantial redesign that would compromise features.

I'm open to dropping this recommendation but I don't think it hurts, and again it is only a recommendation.

I think it might hurt in the sense that it increases the burden on every provider (despite being a recommendation, providers probably will strive to implement all of the recommendations). Also it increases the overall size of the spec, without adding any benefit, since the global API singleton is already guaranteeing idempotence if I'm understanding correctly.

since the global API singleton is already guaranteeing idempotence if I'm understanding correctly

Well, since my changes earlier, we aren't guaranteeing this. The more implementations I looked at, the more I became convinced this would be hard ™️ to absolutely guarantee, as @chrfwow agreed here, without a commensurate benefit. I think as written, because the provider is removed after shutdown, it's extremely unlikely, but not guaranteed that a provider might never have it's shutdown called twice.

I've changed the PR substantially.

Signed-off-by: Todd Baert <[email protected]>

beeme1mr · 2025-07-11T14:38:00Z

specification/sections/01-flag-evaluation.md

 [![experimental](https://img.shields.io/static/v1?label=Status&message=experimental&color=orange)](https://github.com/open-feature/spec/tree/main/specification#experimental)

+The API's `shutdown` function defines a means of graceful shutdown, calling the `shutdown` function on all providers, allowing them to flush telemetry, clean up connections, and release any relevant resources.
+It also provides a means of resetting the API object to its default state, removing all hooks, event handlers, and providers; this is useful for testing purposes.


I believe we have separate APIs to do this in JS, but I think it makes sense for it to be included as part of the shutdown process.

Should we explicitly mention that the noop provider should be the default?

I believe we have separate APIs to do this in JS, but I think it makes sense for it to be included as part of the shutdown process.

Ya I agree.

Should we explicitly mention that the noop provider should be the default?

The noop provider is an implementation detail not mentioned in any normative part of the spec, AFAIK. We only mention that things should "no-op", but we can mention something like that and I think implementations will get the point.

We mention the no-op provider in this non-normative section so I'm OK with mentioning it in a non-normative section here as well. Will add.

Signed-off-by: Todd Baert <[email protected]>

lukas-reining

This looks good to me, I am only still not sure about the provider idempotency problem.

specification/sections/02-providers.md

specification/sections/01-flag-evaluation.md

Co-authored-by: Sahid Velji <[email protected]> Signed-off-by: Todd Baert <[email protected]>

specification/sections/01-flag-evaluation.md

Signed-off-by: Todd Baert <[email protected]>

toddbaert · 2025-07-16T13:24:13Z

Looks like we have a consensus here. Thanks everyone!

toddbaert requested a review from a team as a code owner July 10, 2025 15:41

feat: add point 1.6.2 about shutdown idempotence

5cda5b1

Signed-off-by: Todd Baert <[email protected]>

toddbaert force-pushed the feat/add-162 branch from 276070a to 5cda5b1 Compare July 10, 2025 15:42

toddbaert requested review from aepfli, beeme1mr, erka, kinyoklion, lukas-reining, sahidvelji and thomaspoignant July 10, 2025 15:43

This was referenced Jul 10, 2025

[BUG] Inactive providers are shut down more than once open-feature/go-sdk#397

Closed

fix: implement requirement 1.6.2 open-feature/go-sdk#400

Merged

toddbaert commented Jul 10, 2025

View reviewed changes

specification/sections/01-flag-evaluation.md Outdated Show resolved Hide resolved

beeme1mr previously approved these changes Jul 10, 2025

View reviewed changes

lukas-reining reviewed Jul 10, 2025

View reviewed changes

chrfwow reviewed Jul 11, 2025

View reviewed changes

toddbaert force-pushed the feat/add-162 branch from 6659ec2 to 28803ad Compare July 11, 2025 14:11

toddbaert requested a review from beeme1mr July 11, 2025 14:11

toddbaert force-pushed the feat/add-162 branch from 28803ad to 8212c30 Compare July 11, 2025 14:18

fixup: reset state, run on all providers

f2ef078

Signed-off-by: Todd Baert <[email protected]>

toddbaert force-pushed the feat/add-162 branch from 8212c30 to f2ef078 Compare July 11, 2025 14:26

toddbaert requested review from chrfwow and lukas-reining July 11, 2025 14:27

toddbaert changed the title ~~feat: add point 1.6.2 about shutdown idempotence~~ feat: add point 1.6.2, 2.5.3 about shutdown details Jul 11, 2025

beeme1mr reviewed Jul 11, 2025

View reviewed changes

toddbaert requested a review from bduffany July 11, 2025 14:53

fixup: noop provider mention

d3b121e

Signed-off-by: Todd Baert <[email protected]>

toddbaert requested a review from beeme1mr July 11, 2025 15:09

lukas-reining reviewed Jul 11, 2025

View reviewed changes

specification/sections/02-providers.md Show resolved Hide resolved

erka approved these changes Jul 11, 2025

View reviewed changes

beeme1mr approved these changes Jul 11, 2025

View reviewed changes

lukas-reining approved these changes Jul 11, 2025

View reviewed changes

sahidvelji reviewed Jul 12, 2025

View reviewed changes

specification/sections/01-flag-evaluation.md Outdated Show resolved Hide resolved

chrfwow reviewed Jul 14, 2025

View reviewed changes

specification/sections/01-flag-evaluation.md Outdated Show resolved Hide resolved

Update specification/sections/01-flag-evaluation.md

eab663e

Co-authored-by: Sahid Velji <[email protected]> Signed-off-by: Todd Baert <[email protected]>

toddbaert commented Jul 15, 2025

View reviewed changes

specification/sections/01-flag-evaluation.md Outdated Show resolved Hide resolved

Update specification/sections/01-flag-evaluation.md

c112501

Signed-off-by: Todd Baert <[email protected]>

toddbaert requested review from chrfwow and sahidvelji July 15, 2025 16:22

sahidvelji approved these changes Jul 15, 2025

View reviewed changes

chrfwow approved these changes Jul 16, 2025

View reviewed changes

Merge branch 'main' into feat/add-162

f4da700

toddbaert merged commit 2d4b27b into main Jul 16, 2025
7 checks passed

toddbaert deleted the feat/add-162 branch July 16, 2025 13:25

		With respect to the lifecycle of a given active provider, the global API object's `shutdown` function should be idempotent; multiple calls to `shutdown` should result in only a single execution of the provider's `shutdown` function, even if the provider has not yet transitioned into `NOT_READY` state.
		Implementations should take care to await or cancel the initialization of providers if `shutdown` is called while providers are still being initialized, and have yet to transition to `READY` or some other state.

feat: add point 1.6.2, 2.5.3 about shutdown details #323

feat: add point 1.6.2, 2.5.3 about shutdown details #323

Uh oh!

Conversation

toddbaert commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

EDIT

Uh oh!

Uh oh!

sahidvelji commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toddbaert commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erka commented Jul 10, 2025

Uh oh!

lukas-reining left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddbaert Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukas-reining commented Jul 10, 2025

Uh oh!

toddbaert commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddbaert Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bduffany Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddbaert Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bduffany Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddbaert Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddbaert Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukas-reining left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

toddbaert commented Jul 16, 2025

toddbaert commented Jul 10, 2025 •

edited

Loading

sahidvelji commented Jul 10, 2025 •

edited

Loading

toddbaert commented Jul 10, 2025 •

edited

Loading

toddbaert Jul 10, 2025 •

edited

Loading

toddbaert commented Jul 10, 2025 •

edited

Loading

toddbaert Jul 11, 2025 •

edited

Loading

bduffany Jul 11, 2025 •

edited

Loading

toddbaert Jul 11, 2025 •

edited

Loading

bduffany Jul 11, 2025 •

edited

Loading

toddbaert Jul 11, 2025 •

edited

Loading

toddbaert Jul 11, 2025 •

edited

Loading