Skip to content

Conversation

@pquerna
Copy link
Contributor

@pquerna pquerna commented Sep 3, 2025

benchmark                                           old ns/op        new ns/op     delta
BenchmarkSCC_FixCycles_Large/N=10000_M=100000-8     29685173578      304004762     -98.98%
BenchmarkSCC_FixCycles_Large/N=20000_M=200000-8     137692330860     702017885     -99.49%

benchmark                                           old allocs     new allocs     delta
BenchmarkSCC_FixCycles_Large/N=10000_M=100000-8     44297909       175621         -99.60%
BenchmarkSCC_FixCycles_Large/N=20000_M=200000-8     176603238      350899         -99.80%

benchmark                                           old bytes       new bytes     delta
BenchmarkSCC_FixCycles_Large/N=10000_M=100000-8     3627974760      53200072      -98.53%
BenchmarkSCC_FixCycles_Large/N=20000_M=200000-8     14439696344     106345592     -99.26%

Summary by CodeRabbit

  • New Features

    • Cycle detection now uses SCC-based component analysis and supports context-aware, cancellable operations for more responsive syncs.
  • Refactor

    • Cycle handling rewritten to iterate over computed components (instead of recursive/coloring), improving scalability, determinism, and concurrency friendliness.
  • Tests

    • Tests and benchmarks updated for context-aware APIs, deterministic SCC validation, and added large-scale SCC benchmarks.

@coderabbitai
Copy link

coderabbitai bot commented Sep 3, 2025

Walkthrough

Replaces the DFS/coloring cycle detector with a new SCC-based pipeline, adds SCC utilities and adjacency/reachability helpers, threads context through cycle APIs and tests, and exposes component-based cycle computation and fixing with cancellation support used by the syncer.

Changes

Cohort / File(s) Summary of changes
Cycle detection & fixing (SCC-based)
pkg/sync/expand/cycle.go
Replaced DFS/coloring detector with SCC-based detection; added context-aware APIs: GetFirstCycle(ctx), HasCycles(ctx), ComputeCyclicComponents(ctx), FixCycles(ctx), FixCyclesFromComponents(ctx, cyclic); iterative fix loop honoring ctx.Done(); updated internal sets; removed old DFS/color scaffolding.
Graph utilities
pkg/sync/expand/graph.go
Added private helpers toAdjacency(nodesSubset map[int]struct{}) map[int]map[int]int and reachableFrom(start int) map[int]struct{} for adjacency construction and reachability.
SCC implementation (new package)
pkg/sync/expand/scc/scc.go, pkg/sync/expand/scc/scc_test.go
New scc package implementing parallel FW–BW condensation: CSR builder, deterministic indexing option, parallel BFS, trimming, Tarjan fallback, concurrency optimizations, Options/DefaultOptions, and CondenseFWBWGroupsFromAdj(ctx, adj, opts); tests for topologies and determinism.
Sync pipeline integration
pkg/sync/syncer.go
Replaced single-cycle detection with ComputeCyclicComponents(ctx) and call to FixCyclesFromComponents(ctx, comps); logs sample component; context threaded through.
Tests & benchmarks updated
pkg/sync/expand/graph_test.go, pkg/sync/expand/cycle_benchmark_test.go, pkg/sync/expand/large_scc_benchmark_test.go
Threaded context.Context into cycle-related calls and benchmarks (GetFirstCycle(ctx), FixCycles(ctx)); added large FixCycles benchmark generator and deterministic edge builders.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Syncer as Syncer / Tests
  participant EG as EntitlementGraph
  participant SCC as scc.CondenseFWBWGroupsFromAdj
  participant Fixer as FixCyclesFromComponents

  Syncer->>EG: ComputeCyclicComponents(ctx)
  EG->>EG: toAdjacency(nodesSubset)
  EG->>SCC: CondenseFWBWGroupsFromAdj(ctx, adj, opts)
  SCC-->>EG: return components ([][]int)
  alt no components
    EG-->>Syncer: no cycles
  else components found
    Syncer->>EG: FixCyclesFromComponents(ctx, components)
    EG->>Fixer: iterate components (check ctx.Done())
    loop per component
      Fixer->>EG: fixCycle(component)
      alt ctx canceled
        Fixer-->>Syncer: return ctx.Err()
      end
    end
    Fixer-->>Syncer: return nil / error
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • mstanbCO
  • jirwin

Poem

I hop through nodes with whiskers keen,
I sniff the loops where cycles preen.
With ctx in paw I pause, then mend,
One component fixed, then on I send.
A rabbit nibbles bugs — now cycles end. 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 24cea5c and 4d3bc62.

📒 Files selected for processing (1)
  • pkg/sync/expand/large_scc_benchmark_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/sync/expand/large_scc_benchmark_test.go
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pq/large_cycle_test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sync/expand/cycle.go (1)

140-148: Bug: deleting the wrong adjacency map entry in removeNode

The final delete targets SourcesToDestinations again, leaving DestinationsToSources[nodeID] behind. This can leak state and corrupt future traversals.

-   delete(g.SourcesToDestinations, nodeID)
+   delete(g.DestinationsToSources, nodeID)
🧹 Nitpick comments (1)
pkg/sync/expand/graph_test.go (1)

131-144: Strengthen TestRemoveNode to catch adjacency-map leaks

Add assertions to ensure both adjacency indexes are cleared after removal.

 func TestRemoveNode(t *testing.T) {
   ctx, cancel := context.WithCancel(context.Background())
   defer cancel()

   graph := parseExpression(t, ctx, "1")

   node := graph.GetNode("1")
   require.NotNil(t, node)

   graph.removeNode(1)

   node = graph.GetNode("1")
   require.Nil(t, node)
+  // Ensure adjacency maps don't retain stale entries.
+  _, ok := graph.SourcesToDestinations[1]
+  require.False(t, ok, "SourcesToDestinations leaked entry for removed node")
+  _, ok = graph.DestinationsToSources[1]
+  require.False(t, ok, "DestinationsToSources leaked entry for removed node")
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 782625f and 2f35b90.

📒 Files selected for processing (3)
  • pkg/sync/expand/cycle.go (2 hunks)
  • pkg/sync/expand/graph_test.go (3 hunks)
  • pkg/sync/syncer.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/sync/expand/cycle.go (1)
pkg/sync/expand/graph.go (1)
  • EntitlementGraph (41-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: go-test (1.23.x, windows-latest)
🔇 Additional comments (6)
pkg/sync/expand/graph_test.go (3)

204-208: LGTM: context plumbed into FixCycles

Call site updated correctly to pass ctx and assert no error.


224-228: LGTM: context-aware FixCycles in complex cycle test

Looks correct and consistent with the new signature.


251-255: LGTM: context-aware FixCycles in clique test

Matches the new API and expectations.

pkg/sync/syncer.go (1)

1376-1379: LGTM: propagate context to FixCycles

Correctly forwards ctx; errors on cancellation will bubble up.

pkg/sync/expand/cycle.go (2)

4-5: LGTM: add context import

Required for the new FixCycles signature.


152-171: Iterative, cancellable FixCycles: good change

Loop + ctx check avoids recursion blowups and enables cancellation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sync/expand/cycle.go (1)

94-103: Bug: wrong map deleted in removeNode.

Second delete should target DestinationsToSources, not SourcesToDestinations (already deleted above). This can leave stale reverse entries and corrupt invariants.

 	// Delete all incoming edges.
 	if sources, ok := g.DestinationsToSources[nodeID]; ok {
 		for sourceID, edgeID := range sources {
 			delete(g.SourcesToDestinations[sourceID], nodeID)
 			delete(g.Edges, edgeID)
 		}
 	}
-	delete(g.SourcesToDestinations, nodeID)
+	delete(g.DestinationsToSources, nodeID)
🧹 Nitpick comments (11)
pkg/sync/expand/cycle_benchmark_test.go (2)

154-156: Context setup OK; minor benchmark hygiene.

Consider moving ctx creation inside each sub-benchmark to avoid any risk of cross-sub-benchmark interaction, or just use context.Background() since you never cancel. Also add b.ReportAllocs() for allocation visibility.

 func BenchmarkGetFirstCycle(b *testing.B) {
-	ctx, cancel := context.WithCancel(context.Background())
-	defer cancel()
+	b.ReportAllocs()
+	ctx := context.Background()

163-164: Avoid dead-code elimination in benchmarks.

Assign the result to a package-level sink.

-				_ = g.GetFirstCycle(ctx)
+				benchCycleSink = g.GetFirstCycle(ctx)

Add at file scope:

var benchCycleSink []int

Also applies to: 173-174

pkg/sync/expand/graph_test.go (1)

146-149: Add a cancellation test.

Consider a test that cancels ctx mid-fix to ensure early exit surfaces ctx.Err().

pkg/sync/expand/scc/scc_test.go (2)

10-21: Helpers: mark as test helpers.

Call t.Helper() in makeAdj/normalizeGroups/defaultOpts to improve failure reporting.

-func makeAdj(nodes []int, edges [][2]int) map[int]map[int]int {
+func makeAdj(nodes []int, edges [][2]int) map[int]map[int]int {
+	testing.Helper()

Apply similarly to normalizeGroups and defaultOpts.


220-239: Determinism test: nice.

Covers ordering; consider adding a cancellation test (cancelled ctx before call) to assert timely return.

pkg/sync/expand/cycle.go (3)

32-47: Make cycleDetectionHelper context-aware.

It hardcodes context.Background(); pass a ctx param so benches and callers can cancel.

-func (g *EntitlementGraph) cycleDetectionHelper(
-	nodeID int,
-) ([]int, bool) {
+func (g *EntitlementGraph) cycleDetectionHelper(ctx context.Context, nodeID int) ([]int, bool) {
 	...
-	groups := scc.CondenseFWBWGroupsFromAdj(context.Background(), adj, scc.DefaultOptions())
+	groups := scc.CondenseFWBWGroupsFromAdj(ctx, adj, scc.DefaultOptions())

And update call sites in benchmarks accordingly.


53-68: ComputeCyclicComponents(ctx): OK; consider deterministic mode for tests.

Optional: set Deterministic=true in options for stable grouping when tests depend on group ordering.

-	groups := scc.CondenseFWBWGroupsFromAdj(ctx, adj, scc.DefaultOptions())
+	opts := scc.DefaultOptions()
+	opts.Deterministic = true
+	groups := scc.CondenseFWBWGroupsFromAdj(ctx, adj, opts)

130-238: Avoid creating edges that will be deleted immediately.

Filter intra-component edges when wiring to reduce churn.

-	for destinationID, resourceTypeIDs := range outgoingEdgesToResourceTypeIDs {
+	inCycle := make(map[int]struct{}, len(nodeIDs))
+	for _, id := range nodeIDs { inCycle[id] = struct{}{} }
+	for destinationID, resourceTypeIDs := range outgoingEdgesToResourceTypeIDs {
+		if _, ok := inCycle[destinationID]; ok {
+			continue
+		}
 		g.NextEdgeID++
 		...
 	}
-	for sourceID, resourceTypeIDs := range incomingEdgesToResourceTypeIDs {
+	for sourceID, resourceTypeIDs := range incomingEdgesToResourceTypeIDs {
+		if _, ok := inCycle[sourceID]; ok {
+			continue
+		}
 		g.NextEdgeID++
 		...
 	}
pkg/sync/expand/scc/scc.go (3)

503-524: Sequential BFS step has unsafe concurrent bitset operations.

In the sequential BFS step (lines 503-524), you're directly modifying the visited bitset without atomic operations. While this is safe because you're in a single-threaded context here, the comment at line 513 acknowledges this. However, this creates a maintenance risk if someone later modifies the code without understanding this constraint.

Consider using atomic operations consistently, or at least add more prominent documentation about the safety assumptions:

 			for i := 0; i < len(frontier); i++ {
 				u := frontier[i]
 				rs, re := getRow(u)
 				for p := rs; p < re; p++ {
 					v := getCol(p)
 					if !active.test(v) {
 						continue
 					}
-					// Non-atomic is safe because we are single-threaded on this step
+					// SAFETY: Non-atomic operations are safe here because we're in a
+					// single-threaded context (len(frontier) <= 64 or maxWorkers == 1).
+					// This is a performance optimization to avoid atomic overhead.
 					w := v >> 6
 					mask := uint64(1) << (uint(v) & 63)
 					if (visited.w[w] & mask) == 0 {
 						visited.w[w] |= mask
 						next = append(next, v)
 					}
 				}
 			}

730-781: Tarjan implementation could overflow stack on deep recursion.

The recursive Tarjan implementation could cause stack overflow on graphs with very deep paths. While this is only used for small graphs (N <= SmallCutoff), even small graphs can have deep paths.

Consider converting to an iterative implementation using an explicit stack to avoid stack overflow issues. However, given the SmallCutoff limit of 8192, this may be acceptable for most use cases.


47-68: Consider making EnableTrim default to true.

The trimming optimization can significantly improve performance on sparse graphs with many source/sink nodes. Since it has minimal overhead, consider enabling it by default.

 func DefaultOptions() Options {
 	return Options{
 		MaxWorkers:    runtime.GOMAXPROCS(0),
-		EnableTrim:    false,
+		EnableTrim:    true,
 		SmallCutoff:   8192,
 		Deterministic: false,
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2f35b90 and 8e14722.

📒 Files selected for processing (7)
  • pkg/sync/expand/cycle.go (4 hunks)
  • pkg/sync/expand/cycle_benchmark_test.go (2 hunks)
  • pkg/sync/expand/graph.go (1 hunks)
  • pkg/sync/expand/graph_test.go (7 hunks)
  • pkg/sync/expand/scc/scc.go (1 hunks)
  • pkg/sync/expand/scc/scc_test.go (1 hunks)
  • pkg/sync/syncer.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/sync/syncer.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/sync/expand/scc/scc_test.go (1)
pkg/sync/expand/scc/scc.go (3)
  • Options (48-58)
  • DefaultOptions (61-68)
  • CondenseFWBWGroupsFromAdj (317-366)
pkg/sync/expand/cycle.go (3)
pkg/sync/expand/graph.go (1)
  • EntitlementGraph (41-53)
pkg/sync/expand/scc/scc.go (1)
  • CondenseFWBWGroupsFromAdj (317-366)
vendor/github.com/deckarep/golang-set/v2/set.go (1)
  • NewThreadUnsafeSet (218-224)
🪛 GitHub Check: go-lint
pkg/sync/expand/scc/scc.go

[failure] 515-515:
G115: integer overflow conversion int -> uint (gosec)


[failure] 309-309:
G115: integer overflow conversion int -> uint (gosec)


[failure] 249-249:
G115: integer overflow conversion int -> uint (gosec)


[failure] 232-232:
G115: integer overflow conversion int -> uint (gosec)


[failure] 225-225:
G115: integer overflow conversion int -> uint (gosec)


[failure] 218-218:
G115: integer overflow conversion int -> uint (gosec)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: go-test (1.23.x, windows-latest)
🔇 Additional comments (11)
pkg/sync/expand/graph.go (1)

348-372: BFS helper looks good.

Nil-on-missing start is fine given current callers; no changes requested.

pkg/sync/expand/graph_test.go (1)

163-164: Context threaded through cycle APIs: LGTM.

Good migration to GetFirstCycle(ctx)/FixCycles(ctx).

Also applies to: 192-193, 205-205, 209-209, 224-224, 233-234, 251-251, 260-261

pkg/sync/expand/cycle.go (3)

10-22: GetFirstCycle(ctx): LGTM.

Early-return on HasNoCycles is a good fast path.


24-30: HasCycles(ctx): LGTM.

Straightforward and cancellation-safe via underlying ComputeCyclicComponents.


104-125: FixCyclesFromComponents(ctx): LGTM.

Honors cancellation and sets HasNoCycles at end.

pkg/sync/expand/scc/scc.go (6)

1-5: Package documentation looks good.

The package comment clearly describes the purpose and functionality of the parallel FW-BW SCC condensation implementation.


86-183: CSR construction handles deterministic ordering well.

The CSR construction correctly implements deterministic node ordering when requested, which is important for reproducible results in testing and debugging scenarios.


185-312: Bitset implementation has excellent concurrency documentation.

The bitset implementation has comprehensive documentation about thread-safety guarantees and usage patterns. The atomic operations are correctly implemented with CAS loops.


629-726: trimSingletons implementation is correct and efficient.

The singleton trimming optimization correctly maintains in/out degrees within the active set and properly updates them as vertices are peeled. The use of slice pools helps reduce allocations.


17-45: Sync.Pool usage for slice reuse is well implemented.

The use of sync.Pool for int slices and bucket slices helps reduce GC pressure in the hot path. The implementation correctly handles capacity checks and slice reuse.


368-466: Parallel recursion in sccFWBW is well-structured.

The parallel FW-BW implementation correctly partitions the graph into F\C, B\C, and U regions, with appropriate goroutine-based parallelization for independent subproblems. The context cancellation support is properly integrated.

Comment on lines +313 to +346
// toAdjacency builds an adjacency map for SCC. If nodesSubset is non-nil, only
// include those nodes (and edges between them). Always include all nodes in the
// subset as keys, even if they have zero outgoing edges.
func (g *EntitlementGraph) toAdjacency(nodesSubset map[int]struct{}) map[int]map[int]int {
adj := make(map[int]map[int]int, len(g.Nodes))
include := func(id int) bool {
if nodesSubset == nil {
return true
}
_, ok := nodesSubset[id]
return ok
}

// Ensure keys for all included nodes.
for id := range g.Nodes {
if include(id) {
adj[id] = make(map[int]int)
}
}

// Add edges where both endpoints are included.
for src, dsts := range g.SourcesToDestinations {
if !include(src) {
continue
}
row := adj[src]
for dst := range dsts {
if include(dst) {
row[dst] = 1
}
}
}
return adj
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Build adjacency in O(E_subset) when a subset is provided.

Current loop scans all edges even when nodesSubset is small. Iterate only sources in the subset.

-	// Add edges where both endpoints are included.
-	for src, dsts := range g.SourcesToDestinations {
-		if !include(src) {
-			continue
-		}
-		row := adj[src]
-		for dst := range dsts {
-			if include(dst) {
-				row[dst] = 1
-			}
-		}
-	}
+	// Add edges where both endpoints are included.
+	if nodesSubset == nil {
+		for src, dsts := range g.SourcesToDestinations {
+			row := adj[src]
+			for dst := range dsts {
+				row[dst] = 1
+			}
+		}
+	} else {
+		for src := range nodesSubset {
+			dsts, ok := g.SourcesToDestinations[src]
+			if !ok {
+				continue
+			}
+			row := adj[src]
+			for dst := range dsts {
+				if _, ok := nodesSubset[dst]; ok {
+					row[dst] = 1
+				}
+			}
+		}
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// toAdjacency builds an adjacency map for SCC. If nodesSubset is non-nil, only
// include those nodes (and edges between them). Always include all nodes in the
// subset as keys, even if they have zero outgoing edges.
func (g *EntitlementGraph) toAdjacency(nodesSubset map[int]struct{}) map[int]map[int]int {
adj := make(map[int]map[int]int, len(g.Nodes))
include := func(id int) bool {
if nodesSubset == nil {
return true
}
_, ok := nodesSubset[id]
return ok
}
// Ensure keys for all included nodes.
for id := range g.Nodes {
if include(id) {
adj[id] = make(map[int]int)
}
}
// Add edges where both endpoints are included.
for src, dsts := range g.SourcesToDestinations {
if !include(src) {
continue
}
row := adj[src]
for dst := range dsts {
if include(dst) {
row[dst] = 1
}
}
}
return adj
}
func (g *EntitlementGraph) toAdjacency(nodesSubset map[int]struct{}) map[int]map[int]int {
adj := make(map[int]map[int]int, len(g.Nodes))
include := func(id int) bool {
if nodesSubset == nil {
return true
}
_, ok := nodesSubset[id]
return ok
}
// Ensure keys for all included nodes.
for id := range g.Nodes {
if include(id) {
adj[id] = make(map[int]int)
}
}
// Add edges where both endpoints are included.
if nodesSubset == nil {
for src, dsts := range g.SourcesToDestinations {
row := adj[src]
for dst := range dsts {
row[dst] = 1
}
}
} else {
for src := range nodesSubset {
dsts, ok := g.SourcesToDestinations[src]
if !ok {
continue
}
row := adj[src]
for dst := range dsts {
if _, ok := nodesSubset[dst]; ok {
row[dst] = 1
}
}
}
}
return adj
}

// writer to the same bitset.
func (b *bitset) test(i int) bool {
w := i >> 6
return (b.w[w] & (1 << (uint(i) & 63))) != 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Address potential integer overflow in bitwise operations.

The static analysis correctly identifies potential integer overflows when converting int to uint in bitset operations. While these are unlikely to cause issues in practice (since bit positions should be within valid bounds), it's worth adding bounds checks or using consistent types.

Apply this diff to add bounds checking and handle potential overflows safely:

 // test reads a bit without synchronization. Do not call concurrently with any
 // writer to the same bitset.
 func (b *bitset) test(i int) bool {
+	if i < 0 || i >= len(b.w)*64 {
+		return false
+	}
 	w := i >> 6
 	return (b.w[w] & (1 << (uint(i) & 63))) != 0
 }

 // set writes a bit without synchronization. Not safe to race with other
 // accesses (reads or writes) to the same bitset.
 func (b *bitset) set(i int) {
+	if i < 0 || i >= len(b.w)*64 {
+		return
+	}
 	w := i >> 6
 	b.w[w] |= 1 << (uint(i) & 63)
 }

 // testAndSetAtomic atomically sets bit i and returns true if it was already
 // set. Safe for concurrent use by multiple goroutines.
 func (b *bitset) testAndSetAtomic(i int) bool {
+	if i < 0 || i >= len(b.w)*64 {
+		return false
+	}
 	w := i >> 6
 	mask := uint64(1) << (uint(i) & 63)
 	addr := &b.w[w]

 // clearAtomic atomically clears bit i. Safe for concurrent use by multiple
 // goroutines.
 func (b *bitset) clearAtomic(i int) {
+	if i < 0 || i >= len(b.w)*64 {
+		return
+	}
 	w := i >> 6
 	mask := ^(uint64(1) << (uint(i) & 63))
 	addr := &b.w[w]

Also applies to: 225-225, 232-232, 249-249, 309-309, 515-515

🧰 Tools
🪛 GitHub Check: go-lint

[failure] 218-218:
G115: integer overflow conversion int -> uint (gosec)

🤖 Prompt for AI Agents
In pkg/sync/expand/scc/scc.go around lines 218, 225, 232, 249, 309 and 515, the
bitset operations convert int to uint and shift without bounds checks which can
overflow; add explicit bounds checking to ensure i is within 0..(len(b.w)*64-1)
(or non-negative and less than the expected bit capacity) before performing the
shift and index; return false or handle error when out of bounds, and use
consistent unsigned index types after validation to perform (b.w[w] & (1 <<
(uint(i) & 63))) safely (apply same pattern to all listed lines).

Comment on lines +353 to +374
// Defensive: assign singleton if anything slipped through
cid = nextID
nextID++
comp[idx] = cid
if cid >= len(groups) {
tmp := make([][]int, cid+1)
copy(tmp, groups)
groups = tmp
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Defensive singleton handling may mask underlying bugs.

The defensive code at lines 353-362 that assigns singletons for any vertices with comp[idx] < 0 could hide bugs in the SCC algorithm. If the algorithm is working correctly, all vertices should have been assigned to components.

Consider either:

  1. Removing the defensive code and fixing any underlying issues that cause unassigned vertices
  2. Logging a warning when this case occurs to help debug the root cause
 	for idx := 0; idx < csr.N; idx++ {
 		cid := comp[idx]
 		if cid < 0 {
-			// Defensive: assign singleton if anything slipped through
-			cid = nextID
-			nextID++
-			comp[idx] = cid
-			if cid >= len(groups) {
-				tmp := make([][]int, cid+1)
-				copy(tmp, groups)
-				groups = tmp
-			}
+			// This should not happen if the algorithm is correct
+			panic(fmt.Sprintf("vertex %d (node %d) was not assigned to any component", idx, csr.IdxToNodeID[idx]))
 		}
 		groups[cid] = append(groups[cid], csr.IdxToNodeID[idx])
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Defensive: assign singleton if anything slipped through
cid = nextID
nextID++
comp[idx] = cid
if cid >= len(groups) {
tmp := make([][]int, cid+1)
copy(tmp, groups)
groups = tmp
}
}
for idx := 0; idx < csr.N; idx++ {
cid := comp[idx]
if cid < 0 {
// This should not happen if the algorithm is correct
panic(fmt.Sprintf("vertex %d (node %d) was not assigned to any component", idx, csr.IdxToNodeID[idx]))
}
groups[cid] = append(groups[cid], csr.IdxToNodeID[idx])
}

@ggreer ggreer force-pushed the pq/large_cycle_test branch from 8e14722 to 3725a3e Compare September 3, 2025 19:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
pkg/sync/expand/graph.go (1)

312-346: Build adjacency in O(E_subset); init keys from subset; avoid nil row.

  • When nodesSubset != nil, scanning all edges is O(E). Iterate sources in the subset to get O(E_subset).
  • Initialize adj capacity and keys from nodesSubset ∩ g.Nodes to reduce work.
  • Guard against a possible nil row if an unexpected src lacks a preinitialized entry.
  • Optional: pre-size row by out-degree for fewer map grows.
 func (g *EntitlementGraph) toAdjacency(nodesSubset map[int]struct{}) map[int]map[int]int {
-	adj := make(map[int]map[int]int, len(g.Nodes))
+	capHint := len(g.Nodes)
+	if nodesSubset != nil {
+		capHint = len(nodesSubset)
+	}
+	adj := make(map[int]map[int]int, capHint)
 	include := func(id int) bool {
 		if nodesSubset == nil {
 			return true
 		}
 		_, ok := nodesSubset[id]
 		return ok
 	}
 
-	// Ensure keys for all included nodes.
-	for id := range g.Nodes {
-		if include(id) {
-			adj[id] = make(map[int]int)
-		}
-	}
+	// Ensure keys for all included nodes.
+	if nodesSubset == nil {
+		for id := range g.Nodes {
+			adj[id] = make(map[int]int)
+		}
+	} else {
+		for id := range nodesSubset {
+			if _, ok := g.Nodes[id]; ok {
+				adj[id] = make(map[int]int)
+			}
+		}
+	}
 
 	// Add edges where both endpoints are included.
-	for src, dsts := range g.SourcesToDestinations {
-		if !include(src) {
-			continue
-		}
-		row := adj[src]
-		for dst := range dsts {
-			if include(dst) {
-				row[dst] = 1
-			}
-		}
-	}
+	if nodesSubset == nil {
+		for src, dsts := range g.SourcesToDestinations {
+			row := adj[src]
+			if row == nil {
+				row = make(map[int]int, len(dsts))
+				adj[src] = row
+			}
+			for dst := range dsts {
+				row[dst] = 1
+			}
+		}
+	} else {
+		for src := range nodesSubset {
+			dsts, ok := g.SourcesToDestinations[src]
+			if !ok {
+				continue
+			}
+			row := adj[src]
+			if row == nil {
+				row = make(map[int]int, len(dsts))
+				adj[src] = row
+			}
+			for dst := range dsts {
+				if _, ok := nodesSubset[dst]; ok {
+					row[dst] = 1
+				}
+			}
+		}
+	}
 	return adj
 }
🧹 Nitpick comments (1)
pkg/sync/expand/graph.go (1)

348-372: BFS micro-opts: no reslicing; consistent empty-set return; clarify comment.

  • Avoid queue reslicing to reduce allocator/GC pressure on large graphs.
  • Return an empty set (not nil) when start is absent; update comment for callers’ simplicity.
-// reachableFrom computes the set of node IDs reachable from start over
-// SourcesToDestinations using an iterative BFS.
+// reachableFrom computes the set of node IDs reachable from start over
+// SourcesToDestinations using an iterative BFS. Returns an empty set if start is not present.
 func (g *EntitlementGraph) reachableFrom(start int) map[int]struct{} {
-	if _, ok := g.Nodes[start]; !ok {
-		return nil
-	}
-	visited := make(map[int]struct{}, 16)
-	queue := make([]int, 0, 16)
-	queue = append(queue, start)
-	visited[start] = struct{}{}
-	for len(queue) > 0 {
-		u := queue[0]
-		queue = queue[1:]
+	if _, ok := g.Nodes[start]; !ok {
+		return map[int]struct{}{}
+	}
+	visited := make(map[int]struct{}, 32)
+	queue := make([]int, 0, 32)
+	queue = append(queue, start)
+	visited[start] = struct{}{}
+	for head := 0; head < len(queue); head++ {
+		u := queue[head]
 		if nbrs, ok := g.SourcesToDestinations[u]; ok {
 			for v := range nbrs {
 				if _, seen := visited[v]; seen {
 					continue
 				}
 				visited[v] = struct{}{}
 				queue = append(queue, v)
 			}
 		}
 	}
 	return visited
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8e14722 and 3725a3e.

📒 Files selected for processing (7)
  • pkg/sync/expand/cycle.go (4 hunks)
  • pkg/sync/expand/cycle_benchmark_test.go (2 hunks)
  • pkg/sync/expand/graph.go (1 hunks)
  • pkg/sync/expand/graph_test.go (7 hunks)
  • pkg/sync/expand/scc/scc.go (1 hunks)
  • pkg/sync/expand/scc/scc_test.go (1 hunks)
  • pkg/sync/syncer.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/sync/syncer.go
  • pkg/sync/expand/scc/scc_test.go
  • pkg/sync/expand/graph_test.go
  • pkg/sync/expand/cycle.go
  • pkg/sync/expand/cycle_benchmark_test.go
  • pkg/sync/expand/scc/scc.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: go-test (1.23.x, windows-latest)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
pkg/sync/expand/large_scc_benchmark_test.go (7)

13-13: Silence gosec G404 for deterministic benchmark RNG

math/rand is appropriate here; the randomness is non-security and seeded for reproducibility. Add a nolint to keep CI green.

-	rng := rand.New(rand.NewSource(seed))
+	rng := rand.New(rand.NewSource(seed)) //nolint:gosec // deterministic, non-crypto RNG for benchmarks

23-40: Make embedded cycles consistently length-cycleLen and avoid degeneration near N

The current stride/idx approach can yield truncated cycles and many duplicates when idx approaches N. A modulo-based wrap keeps cycles full and evenly spread.

-	if cycleLen > 1 && numCycles > 0 {
-		stride := N / (numCycles + 1)
-		idx := 1
-		for c := 0; c < numCycles; c++ {
-			start := idx
-			for k := 0; k < cycleLen-1 && idx < N; k++ {
-				add(idx, idx+1)
-				idx++
-			}
-			if idx <= N {
-				add(idx, start)
-			}
-			idx += stride
-			if idx > N {
-				idx = N
-			}
-		}
-	}
+	if cycleLen > 1 && numCycles > 0 {
+		stride := N / numCycles
+		if stride == 0 {
+			stride = 1
+		}
+		for c := 0; c < numCycles; c++ {
+			start := 1 + (c*stride)%N
+			for k := 0; k < cycleLen; k++ {
+				u := ((start + k - 1) % N) + 1
+				v := ((start + k) % N) + 1
+				add(u, v)
+			}
+		}
+	}

Would you like me to add a quick post-generation check that at least one cycle is present?


41-46: Guard against pathological M to prevent an endless loop

If M exceeds the max unique non-self edges (N*(N-1)), the fill loop never terminates. Clamp the target or fail fast.

-	for len(edgeSet) < M {
+	maxEdges := N * (N - 1)
+	target := M
+	if M > maxEdges {
+		target = maxEdges // alternatively: panic or return early
+	}
+	for len(edgeSet) < target {
 		u := 1 + rng.Intn(N)
 		v := 1 + rng.Intn(N)
 		add(u, v)
 	}

57-57: Set NextNodeID to next free ID

Minor invariant: NextNodeID typically points past the highest allocated node.

-		NextNodeID:            N,
+		NextNodeID:            N + 1,

101-106: Exclude graph build from timing to measure FixCycles precisely

Building the graph each iteration skews ns/op and allocs toward setup. Stop the timer during build if you want to benchmark FixCycles itself.

-	for i := 0; i < b.N; i++ {
-		g := buildGraphFromEdges(N, edges)
-		if err := g.FixCycles(ctx); err != nil {
-			b.Fatalf("FixCycles error: %v", err)
-		}
-	}
+	for i := 0; i < b.N; i++ {
+		b.StopTimer()
+		g := buildGraphFromEdges(N, edges)
+		b.StartTimer()
+		if err := g.FixCycles(ctx); err != nil {
+			b.Fatalf("FixCycles error: %v", err)
+		}
+	}

Note: ReportAllocs includes allocations outside the timer; isolating FixCycles allocs would require cloning a prebuilt graph or alternate harness.


109-109: Make long runs opt-in via env var instead of a const

Toggling with BENCH_LONG keeps defaults fast while enabling big cases in CI/local when desired. Requires importing os.

--- a/pkg/sync/expand/large_scc_benchmark_test.go
+++ b/pkg/sync/expand/large_scc_benchmark_test.go
@@
-import (
+import (
 	"context"
 	"fmt"
+	"os"
 	"math/rand"
 	"testing"
 )
@@
-const enableLongTests = false
+var enableLongTests = os.Getenv("BENCH_LONG") != ""
@@
-	if !testing.Short() && enableLongTests {
+	if !testing.Short() && enableLongTests {
 		cases = append(cases, fullCases...)
 	}

Also applies to: 121-123, 3-8


12-12: Address gocritic captLocal: prefer lower-case param names

The linter flags capitalized locals (N/M). Either rename to n/m or suppress at call sites. Renaming keeps signal-to-noise high.

Example (apply similarly to other funcs):

-func benchmarkFixCycles(b *testing.B, N, M, cycleLen, numCycles int) {
+func benchmarkFixCycles(b *testing.B, n, m, cycleLen, numCycles int) {
@@
-	edges := generateEdges(N, M, cycleLen, numCycles, 1)
+	edges := generateEdges(n, m, cycleLen, numCycles, 1)
@@
-		g := buildGraphFromEdges(N, edges)
+		g := buildGraphFromEdges(n, edges)

Also applies to: 55-55, 95-95

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3725a3e and 24cea5c.

📒 Files selected for processing (1)
  • pkg/sync/expand/large_scc_benchmark_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/sync/expand/large_scc_benchmark_test.go (1)
pkg/sync/expand/graph.go (2)
  • EntitlementGraph (41-53)
  • Edge (21-28)
🪛 GitHub Check: go-lint
pkg/sync/expand/large_scc_benchmark_test.go

[failure] 13-13:
G404: Use of weak random number generator (math/rand or math/rand/v2 instead of crypto/rand) (gosec)


[failure] 95-95:
captLocal: `N' should not be capitalized (gocritic)


[failure] 55-55:
captLocal: `N' should not be capitalized (gocritic)


[failure] 12-12:
captLocal: `N' should not be capitalized (gocritic)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: go-test (1.23.x, windows-latest)
🔇 Additional comments (1)
pkg/sync/expand/large_scc_benchmark_test.go (1)

111-119: Sub-benchmark matrix looks good

Clear sizing, readable names, and short/full split are sensible.

@ggreer ggreer merged commit 8210db3 into main Sep 4, 2025
6 checks passed
@ggreer ggreer deleted the pq/large_cycle_test branch September 4, 2025 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants