-
Notifications
You must be signed in to change notification settings - Fork 283
flowcontrol: Support dynamic priority provisioning #2006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -120,9 +120,14 @@ type FlowRegistry struct { | |
| flowStates sync.Map // stores `types.FlowKey` -> *flowState | ||
|
|
||
| // Globally aggregated statistics, updated atomically via lock-free propagation. | ||
| totalByteSize atomic.Int64 | ||
| totalLen atomic.Int64 | ||
| perPriorityBandStats map[int]*bandStats // Keyed by priority. | ||
| totalByteSize atomic.Int64 | ||
| totalLen atomic.Int64 | ||
|
|
||
| // perPriorityBandStats tracks aggregated stats per priority. | ||
| // Key: int (priority), Value: *bandStats | ||
| // We use sync.Map here to allow for lock-free reads on the hot path (Stats) while allowing dynamic provisioning to | ||
| // add new keys safely. | ||
| perPriorityBandStats sync.Map | ||
|
|
||
| // --- Administrative state (protected by `mu`) --- | ||
|
|
||
|
|
@@ -152,11 +157,10 @@ func withClock(clk clock.WithTickerAndDelayedExecution) RegistryOption { | |
| func NewFlowRegistry(config *Config, logger logr.Logger, opts ...RegistryOption) (*FlowRegistry, error) { | ||
| cfg := config.Clone() | ||
| fr := &FlowRegistry{ | ||
| config: cfg, | ||
| logger: logger.WithName("flow-registry"), | ||
| activeShards: []*registryShard{}, | ||
| drainingShards: make(map[string]*registryShard), | ||
| perPriorityBandStats: make(map[int]*bandStats, len(cfg.PriorityBands)), | ||
| config: cfg, | ||
| logger: logger.WithName("flow-registry"), | ||
| activeShards: []*registryShard{}, | ||
| drainingShards: make(map[string]*registryShard), | ||
| } | ||
|
|
||
| for _, opt := range opts { | ||
|
|
@@ -167,7 +171,7 @@ func NewFlowRegistry(config *Config, logger logr.Logger, opts ...RegistryOption) | |
| } | ||
|
|
||
| for prio := range cfg.PriorityBands { | ||
| fr.perPriorityBandStats[prio] = &bandStats{} | ||
| fr.perPriorityBandStats.Store(prio, &bandStats{}) | ||
| } | ||
|
|
||
| if err := fr.updateShardCount(cfg.InitialShardCount); err != nil { | ||
|
|
@@ -252,9 +256,18 @@ func (fr *FlowRegistry) WithConnection(key types.FlowKey, fn func(conn contracts | |
|
|
||
| // prepareNewFlow creates a new `flowState` and synchronizes its queues and policies onto all existing shards. | ||
| func (fr *FlowRegistry) prepareNewFlow(key types.FlowKey) (*flowState, error) { | ||
| // Get a stable snapshot of the shard topology. | ||
| // An RLock is sufficient because while the list of shards must be stable, the internal state of each shard is | ||
| // protected by its own lock. | ||
| fr.mu.RLock() | ||
| _, exists := fr.config.PriorityBands[key.Priority] | ||
| fr.mu.RUnlock() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This fine grained management of locking is error prone, do we really need it?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the current JIT implementation, yes. In the controller implementation you suggested, probably not depending on how we handle the race condition between reconciliation and request serving. |
||
|
|
||
| // If the band was missing, we must acquire the Write Lock to create it. | ||
| if !exists { | ||
| if err := fr.ensurePriorityBand(key.Priority); err != nil { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was expecting that priority bands are created/deleted as a trigger of InfObj reconcile events. When a new infObj is created we create a band, when deleted we delete the band. doing it in the request processing path is an anti-pattern.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Driving this from the Consistency:
With the current JIT behavior: it works. With the No An Lifecycle Management: If we listen to This isn't strictly an issue with the CRD reconciliation approach; my current PR only adds bands (which is likely okay for now). The state management issue just becomes more apparent under the CRD model though.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This scenario can also happen when we look up the InfObj, it may not have been created by the time the request got to the epp, so it is not a new problem.
We only need to be eventually consistent, but when creating a new infObj should trigger a state update in the epp that includes the availability of the infObj itself as well as all associated state like the Band, this state update in the epp can happen atomically.
The default priority is a special case for which we create a band for it on startup. All that being said, I am ok proceeding with the current approach you have, but please open an issue to track this. I am sure we will revisit it when we introduce the flow CRD |
||
| return nil, err | ||
| } | ||
| } | ||
|
|
||
| // Now we know the band exists (or we errored). Re-acquire Read Lock to safely read the topology and build components. | ||
| fr.mu.RLock() | ||
| defer fr.mu.RUnlock() | ||
|
|
||
|
|
@@ -272,6 +285,34 @@ func (fr *FlowRegistry) prepareNewFlow(key types.FlowKey) (*flowState, error) { | |
| return &flowState{key: key}, nil | ||
| } | ||
|
|
||
| // ensurePriorityBand safely provisions a new priority band. | ||
| func (fr *FlowRegistry) ensurePriorityBand(priority int) error { | ||
| fr.mu.Lock() | ||
| defer fr.mu.Unlock() | ||
|
|
||
| // Double-Check: Someone might have created it while we swapped locks in prepareNewFlow. | ||
| if _, ok := fr.config.PriorityBands[priority]; ok { | ||
| return nil | ||
| } | ||
|
|
||
| fr.logger.Info("Dynamically provisioning new priority band", "priority", priority) | ||
|
|
||
| newBand := *fr.config.DefaultPriorityBand | ||
| newBand.Priority = priority | ||
| newBand.PriorityName = fmt.Sprintf("Dynamic-%d", priority) | ||
| fr.config.PriorityBands[priority] = &newBand | ||
|
|
||
| fr.perPriorityBandStats.LoadOrStore(priority, &bandStats{}) | ||
|
|
||
| fr.repartitionShardConfigsLocked() | ||
|
|
||
| for _, shard := range fr.activeShards { | ||
| shard.addPriorityBand(priority) | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| // --- `contracts.FlowRegistryObserver` Implementation --- | ||
|
|
||
| // Stats returns globally aggregated statistics for the entire `FlowRegistry`. | ||
|
|
@@ -288,16 +329,19 @@ func (fr *FlowRegistry) Stats() contracts.AggregateStats { | |
| PerPriorityBandStats: make(map[int]contracts.PriorityBandStats, len(fr.config.PriorityBands)), | ||
| } | ||
|
|
||
| for p, s := range fr.perPriorityBandStats { | ||
| bandCfg := fr.config.PriorityBands[p] | ||
| stats.PerPriorityBandStats[p] = contracts.PriorityBandStats{ | ||
| Priority: p, | ||
| fr.perPriorityBandStats.Range(func(key, value any) bool { | ||
| priority := key.(int) | ||
| bandStats := value.(*bandStats) | ||
| bandCfg := fr.config.PriorityBands[priority] | ||
| stats.PerPriorityBandStats[priority] = contracts.PriorityBandStats{ | ||
| Priority: priority, | ||
| PriorityName: bandCfg.PriorityName, | ||
| CapacityBytes: bandCfg.MaxBytes, | ||
| ByteSize: uint64(s.byteSize.Load()), | ||
| Len: uint64(s.len.Load()), | ||
| ByteSize: uint64(bandStats.byteSize.Load()), | ||
| Len: uint64(bandStats.len.Load()), | ||
| } | ||
| } | ||
| return true | ||
| }) | ||
| return stats | ||
| } | ||
|
|
||
|
|
@@ -585,11 +629,8 @@ func (fr *FlowRegistry) updateAllShardsCacheLocked() { | |
|
|
||
| // propagateStatsDelta is the top-level, lock-free aggregator for all statistics. | ||
| func (fr *FlowRegistry) propagateStatsDelta(priority int, lenDelta, byteSizeDelta int64) { | ||
| stats, ok := fr.perPriorityBandStats[priority] | ||
| if !ok { | ||
| panic(fmt.Sprintf("invariant violation: priority band (%d) stats missing during propagation", priority)) | ||
| } | ||
|
|
||
| val, _ := fr.perPriorityBandStats.Load(priority) | ||
| stats := val.(*bandStats) | ||
| stats.len.Add(lenDelta) | ||
| stats.byteSize.Add(byteSizeDelta) | ||
| fr.totalLen.Add(lenDelta) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be the infObj object name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be much better for observability. The JIT path in this PR makes this tricky. If we manage to move to the controller path, this becomes much easier to populate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question though... How do we handle N
InferenceObjectives mapped to the same priority level? Right now, priority bands are unique per int. It would not be simple (and perhaps not desirable) to break that.I guess we could make the name a composite of the objectives mapped to it `"InfObjA,InfObjB,..."? The name itself then becomes dynamic data and is no longer immutable (though I don't foresee any issues with this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point on multiple infObjs mapping to the same priority. I think the name has no value and it is rather confusing since naturally one will expect the infObj name to be logged. So I recommend removing it.