Skip to content

Race conditions in prefix cache indexer lock management #2500

@hexfusion

Description

@hexfusion

Add() and RemovePod() release and re-acquire lock mid-operation. The gap allows concurrent goroutines to leave stale entries in hashToPods referencing pods that no longer exist in podToLRU.

sequenceDiagram                                                                                                                                                                  
      participant A as Goroutine A (Add) 
      participant mu as i.mu                                                                                                                                                       
      participant B as Goroutine B (RemovePod)                                                                                                                                     
                                                                                                                                                                                   
      A->>mu: Lock()                                                                                                                                                               
      Note over A: create LRU for pod1
      A->>mu: Unlock()

      B->>mu: RLock()
      Note over B: get LRU ref
      B->>mu: RUnlock()
      Note over B: clean up entries
      B->>mu: Lock()
      Note over B: delete(podToLRU, pod1)
      B->>mu: Unlock()

      Note over A: lruForPod.Add(hash) - LRU is now orphaned
      A->>mu: Lock()
      Note over A: hashToPods[hash][pod1] = ... STALE
      A->>mu: Unlock()
Loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions