-
Notifications
You must be signed in to change notification settings - Fork 4
Expand grants faster. #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pdate grant expansion benchmark to create more groups and return correct data in ListEntitlements and ListGrants.
This drastically reduces the amount of time we spend serializing and saving state.
…bers instead of directly messing with the data.
… maxDepth in syncer and results are consistent.
WalkthroughThe changes in this pull request involve modifications to several files related to the Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (9)
pkg/sync/expand/cycle.go (2)
98-98: Consider moving HasNoCycles flag to the base caseWhile setting the flag here is correct, consider moving it to after the recursive call returns. This would set the flag only once when all cycles are fixed, rather than potentially setting it multiple times during recursion.
Here's a suggested refactor:
func (g *EntitlementGraph) FixCycles() error { if g.HasNoCycles { return nil } cycle := g.GetFirstCycle() if cycle == nil { - g.HasNoCycles = true return nil } if err := g.fixCycle(cycle); err != nil { return err } // Recurse! - return g.FixCycles() + if err := g.FixCycles(); err != nil { + return err + } + g.HasNoCycles = true + return nil }
Line range hint
92-107: Consider adding recursion depth protectionThe recursive implementation of FixCycles could potentially lead to stack overflow with deeply nested cycles. Consider adding a maximum recursion depth or converting to an iterative implementation.
Example protection mechanism:
func (g *EntitlementGraph) FixCycles() error { + return g.fixCyclesWithDepth(0) +} + +func (g *EntitlementGraph) fixCyclesWithDepth(depth int) error { + const maxDepth = 1000 // Adjust based on your needs + if depth > maxDepth { + return fmt.Errorf("maximum cycle depth of %d exceeded", maxDepth) + } + if g.HasNoCycles { return nil } cycle := g.GetFirstCycle() if cycle == nil { g.HasNoCycles = true return nil } if err := g.fixCycle(cycle); err != nil { return err } - return g.FixCycles() + return g.fixCyclesWithDepth(depth + 1) }pkg/sync/expand/graph.go (2)
11-33: LGTM! Consider enhancing documentation for shortened JSON tagsThe shortened JSON tags effectively reduce serialized data size for checkpointing. However, to maintain long-term maintainability, consider adding a mapping table in the package documentation that explains the shortened tag meanings (e.g.,
sid→source_entitlement_id).
162-162: Add test coverage for HasNoCycles reset behaviorThe
HasNoCyclesfield is correctly reset when the graph is modified throughAddEntitlementandAddEdge. However, this new behavior should be covered by unit tests to ensure it remains working as expected.Test cases should verify that:
HasNoCyclesis reset when adding a new entitlementHasNoCyclesis reset when adding a new edgeHasNoCyclesremains unchanged when adding duplicate edges (early return case inAddEdge)Would you like me to help generate the test cases?
Also applies to: 273-274
pkg/sync/syncer.go (1)
62-63: Consider adding documentation about checkpointing behavior.While the 10-second checkpoint interval is a good balance between performance and data safety, it would be helpful to document this behavior and its implications for recovery in case of interruptions. This helps users understand the potential trade-offs.
Consider adding a comment like:
+// minCheckpointInterval defines the minimum time between checkpoints. +// This helps improve performance by reducing checkpoint frequency, but means +// up to 10 seconds of progress could be lost if the sync is interrupted. const minCheckpointInterval = 10 * time.Secondpkg/sync/syncer_test.go (4)
69-75: Ensure safe access tomc.entDBentriesAccessing
mc.entDB[groupId][0]andmc.entDB[nextGroupId][0]assumes that each group has at least one entitlement. To prevent potential panics due to empty slices, consider adding checks to ensure these slices are not empty before accessing the first element.You can modify the code as follows:
for i := 0; i < circleSize; i += 1 { groupId := "group_" + strconv.Itoa(i) nextGroupId := "group_" + strconv.Itoa((i+1)%circleSize) // Wrap around to the start for the last element - currentEnt := mc.entDB[groupId][0] - nextEnt := mc.entDB[nextGroupId][0] + currentEnts := mc.entDB[groupId] + nextEnts := mc.entDB[nextGroupId] + if len(currentEnts) == 0 || len(nextEnts) == 0 { + return fmt.Errorf("no entitlements found for group %s or %s", groupId, nextGroupId) + } + currentEnt := currentEnts[0] + nextEnt := nextEnts[0] _ = mc.AddGroupMember(ctx, currentEnt.Resource, nextEnt.Resource, nextEnt) }
189-195: Consider using a switch statement for resource type handlingUsing a switch statement can enhance readability and scalability when dealing with multiple resource types.
You can refactor the code as follows:
func (mc *mockConnector) ListResources(ctx context.Context, in *v2.ResourcesServiceListResourcesRequest, opts ...grpc.CallOption) (*v2.ResourcesServiceListResourcesResponse, error) { - if in.ResourceTypeId == "group" { - return &v2.ResourcesServiceListResourcesResponse{List: mc.resourceDB}, nil - } - if in.ResourceTypeId == "user" { - return &v2.ResourcesServiceListResourcesResponse{List: mc.userDB}, nil - } + switch in.ResourceTypeId { + case "group": + return &v2.ResourcesServiceListResourcesResponse{List: mc.resourceDB}, nil + case "user": + return &v2.ResourcesServiceListResourcesResponse{List: mc.userDB}, nil + default: + return nil, fmt.Errorf("unknown resource type %s", in.ResourceTypeId) + } }
199-200: EnsureListEntitlementsreturns an empty slice instead of nilWhen
mc.entDB[in.Resource.Id.Resource]is nil (i.e., no entitlements for the given resource), the returnedListfield will be nil. This could lead tonilslice issues in code that expects an empty slice. Consider initializing the slice to an empty slice when no entitlements are found.You can modify the code as follows:
func (mc *mockConnector) ListEntitlements(ctx context.Context, in *v2.EntitlementsServiceListEntitlementsRequest, opts ...grpc.CallOption) (*v2.EntitlementsServiceListEntitlementsResponse, error) { - return &v2.EntitlementsServiceListEntitlementsResponse{List: mc.entDB[in.Resource.Id.Resource]}, nil + ents, ok := mc.entDB[in.Resource.Id.Resource] + if !ok { + ents = []*v2.Entitlement{} + } + return &v2.EntitlementsServiceListEntitlementsResponse{List: ents}, nil }
203-204: EnsureListGrantsreturns an empty slice instead of nilSimilar to
ListEntitlements, ifmc.grantDB[in.Resource.Id.Resource]is nil,Listwill be nil. Returning an empty slice can prevent potentialnildereference issues.You can modify the code as follows:
func (mc *mockConnector) ListGrants(ctx context.Context, in *v2.GrantsServiceListGrantsRequest, opts ...grpc.CallOption) (*v2.GrantsServiceListGrantsResponse, error) { - return &v2.GrantsServiceListGrantsResponse{List: mc.grantDB[in.Resource.Id.Resource]}, nil + grants, ok := mc.grantDB[in.Resource.Id.Resource] + if !ok { + grants = []*v2.Grant{} + } + return &v2.GrantsServiceListGrantsResponse{List: grants}, nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
pkg/sync/expand/cycle.go(2 hunks)pkg/sync/expand/graph.go(5 hunks)pkg/sync/syncer.go(1 hunks)pkg/sync/syncer_test.go(5 hunks)
🔇 Additional comments (13)
pkg/sync/expand/cycle.go (2)
93-95: LGTM! Consistent optimization with GetFirstCycle
The early return optimization is consistent with the GetFirstCycle method and prevents unnecessary recursive calls when we know there are no cycles.
11-13: LGTM! Verify HasNoCycles flag maintenance
The early return optimization is a good performance improvement when we know the graph has no cycles.
Let's verify that the HasNoCycles flag is properly maintained when the graph is modified:
✅ Verification successful
HasNoCycles flag is properly maintained
The flag is correctly managed throughout the codebase:
- Initialized as
falsein graph creation - Reset to
falsewhen graph structure changes - Set to
trueonly when cycle detection confirms no cycles exist
Key evidence from the codebase:
HasNoCyclesis reset in graph modification operations- Only set to
trueinFixCycles()after confirming no cycles exist - Early returns in both
GetFirstCycle()andFixCycles()are safe to use
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for all places where the graph structure is modified to ensure HasNoCycles is reset
ast-grep --pattern 'func (g *EntitlementGraph) $method($_) $_ {
$$$
g.HasNoCycles = false
$$$
}'
Length of output: 104
Script:
#!/bin/bash
# Let's try a different approach to find HasNoCycles modifications
# First, let's find the EntitlementGraph struct definition
echo "=== EntitlementGraph struct definition ==="
rg "type EntitlementGraph struct" -A 5
echo -e "\n=== All modifications to HasNoCycles ==="
rg "HasNoCycles" -B 2 -A 2
echo -e "\n=== Methods that modify the graph structure ==="
rg "func \(g \*EntitlementGraph\)" -A 1
Length of output: 6379
pkg/sync/expand/graph.go (1)
52-52: LGTM! Well-structured optimization for cycle detection
The HasNoCycles field is properly added and initialized, providing a foundation for the cycle detection optimizations mentioned in the PR objectives.
Also applies to: 62-62
pkg/sync/syncer.go (3)
57-57: LGTM: Checkpointing optimization additions.
The new field lastCheckPointTime and constant minCheckpointInterval are well-designed additions that support the PR's objective of optimizing checkpointing frequency. The 10-second interval is a reasonable default that balances between performance and data safety.
Also applies to: 62-62
65-69: LGTM: Efficient checkpoint throttling implementation.
The checkpoint throttling logic is well-implemented:
- Uses efficient time comparison
- Updates timestamp only when actually performing checkpoint
- Clean early return pattern
65-69: Verify checkpoint throttling implementation uniqueness.
Let's verify that this is the only place implementing checkpoint throttling and that all checkpoint calls go through this method.
✅ Verification successful
Checkpoint throttling implementation is correctly centralized
The verification confirms:
- All Checkpoint calls go through the syncer's Checkpoint method
- The throttling implementation using
minCheckpointIntervalandlastCheckPointTimeis unique to this location - No other parts of the codebase implement similar checkpoint throttling mechanisms
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for direct calls to Checkpoint method
rg -A 2 '\.Checkpoint\('
# Search for other potential checkpoint throttling implementations
rg -A 2 'minCheckpointInterval|lastCheckPointTime'
Length of output: 892
pkg/sync/syncer_test.go (7)
5-5: Importing "fmt" is necessary for error formatting
The newly added import "fmt" is used for creating formatted error messages, which enhances error readability.
20-24: Global declaration of groupResourceType enhances reusability
Defining groupResourceType globally improves code clarity and allows for reuse across multiple functions without redundant definitions.
25-30: Global declaration of userResourceType with annotations simplifies resource management
By declaring userResourceType globally and adding the SkipEntitlementsAndGrants annotation, you streamline user resource handling and avoid unnecessary entitlement and grant processing for users.
37-40: Adjusted benchmark parameters for better performance testing
Modifying circleSize to 7 and introducing groupCount as 100 allows for a more extensive and realistic performance benchmark of grant expansion.
46-64: Efficient creation of groups and users using new helper methods
The introduction of AddGroup, AddUser, and AddGroupMember methods in the mockConnector simplifies the setup process in the benchmark and improves code readability.
98-100: Conversion of entDB and grantDB to maps improves data access efficiency
Changing entDB and grantDB from slices to maps keyed by resource ID allows for faster lookups and better organization of entitlements and grants associated with specific resources.
161-182: Proper handling of expandable entitlements in AddGroupMember
The updated AddGroupMember method correctly adds annotations for expandable entitlements, ensuring that grant expansion works as intended during the sync process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
pkg/sync/syncer.go (1)
66-69: Consider adding debug logging for skipped checkpoints.While the checkpoint throttling logic is correct, adding debug logging when skipping checkpoints would help with observability and debugging.
if !s.lastCheckPointTime.IsZero() && time.Since(s.lastCheckPointTime) < minCheckpointInterval { + l := ctxzap.Extract(ctx) + l.Debug("skipping checkpoint due to throttling", + zap.Duration("time_since_last", time.Since(s.lastCheckPointTime)), + zap.Duration("min_interval", minCheckpointInterval)) return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
pkg/sync/syncer.go(1 hunks)
🔇 Additional comments (2)
pkg/sync/syncer.go (2)
47-57: LGTM: Checkpointing optimization changes.
The addition of lastCheckPointTime field and minCheckpointInterval constant effectively implements the checkpointing frequency limitation. The 10-second interval is a reasonable default that should significantly reduce the overhead of frequent checkpointing while maintaining data safety.
Also applies to: 62-63
66-69: Verify impact on critical checkpoints.
The 10-second throttling affects all checkpoint calls. Let's verify that this doesn't impact data consistency during critical operations like error recovery or sync completion.
✅ Verification successful
No critical checkpoint operations are impacted by throttling
The code analysis reveals that the 10-second throttling on checkpoints is safe because:
- Checkpoints are used to persist sync state during normal operation, not for data consistency/recovery
- The throttling only skips intermediate checkpoints, not critical ones:
- Initial state is always checkpointed (lastCheckPointTime is zero)
- Final state after each major operation is checkpointed (FinishAction calls)
- Error paths immediately return without checkpointing
- The sync process is idempotent - if interrupted, it can safely resume from the last checkpoint
The throttling optimization improves performance while maintaining data consistency and recovery capabilities.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all critical checkpoint calls that might be affected by throttling
# Search for checkpoint calls in error handling paths
rg -B 3 -A 1 'if err .* {.*Checkpoint.*}'
# Search for checkpoint calls before/after state changes
rg -B 3 -A 1 '(FinishAction|NextPage).*Checkpoint'
Length of output: 216
Script:
#!/bin/bash
# Search for checkpoint calls in error handling paths with proper regex
rg -B 3 -A 1 'if.*err.*\{.*[Cc]heckpoint'
# Search for checkpoint calls in the syncer implementation
rg -B 2 -A 2 'Checkpoint\(' pkg/sync/
# Find all method calls that might trigger checkpoints
ast-grep --pattern 'Checkpoint($$$)'
Length of output: 131194
Script:
#!/bin/bash
# Search for Checkpoint method implementation and its usage
ast-grep --pattern 'func $_ Checkpoint($$$) { $$$ }'
# Find direct calls to Checkpoint
rg -B 2 'Checkpoint\(' pkg/sync/ --type go
# Look for error handling around Checkpoint calls
rg -B 2 'if err.*=.*Checkpoint' pkg/sync/ --type go
Length of output: 591
Script:
#!/bin/bash
# Get more context around the Checkpoint implementation
rg -B 5 -A 10 'func \(s \*syncer\) Checkpoint' pkg/sync/syncer.go
# Find the state transitions and error handling
rg -B 3 -A 3 'state\.' pkg/sync/syncer.go
# Look for any error handling or recovery mechanisms
rg -B 3 -A 3 'if err.*!= nil' pkg/sync/syncer.go
Length of output: 23857
… writing lots of grants in service mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
pkg/dotc1z/manager/local/local.go (1)
87-87: LGTM! Enabling WAL mode will improve performance.The addition of WAL (Write-Ahead Logging) journal mode is a good optimization that aligns with the PR's performance objectives. This should significantly improve write performance during checkpointing operations by reducing write contention and enabling better concurrency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
pkg/dotc1z/manager/local/local.go(1 hunks)
🔇 Additional comments (1)
pkg/dotc1z/manager/local/local.go (1)
87-87: Verify WAL file cleanup during Close()
While WAL mode improves performance, it creates additional files (*.wal, *.shm) alongside the database. Let's verify that these files are properly cleaned up.
| ) | ||
|
|
||
| return dotc1z.NewC1ZFile(ctx, l.tmpPath, dotc1z.WithTmpDir(l.tmpDir)) | ||
| return dotc1z.NewC1ZFile(ctx, l.tmpPath, dotc1z.WithTmpDir(l.tmpDir), dotc1z.WithPragma("journal_mode", "WAL")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, are there any side effects with this? When we re-compress the sqlite data file, wouldn't we also need the WAL file(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WAL file is deleted when the last DB connection is closed. We always close the DB before calling saveC1z(), so the WAL file doesn't exist when we compress. If I comment out the calls to os.Remove() and os.RemoveAll(), the only file in the temp dir is the DB.
ggreer@gmac:/var/folders/by/00zvpw316q50n2r1jlwqkqgr0000gn/T/c1z2968114110% ls -la
total 18880
drwx------ 3 ggreer staff 96 Nov 22 11:16 ./
drwx------@ 570 ggreer staff 18240 Nov 22 11:16 ../
-rw-r--r-- 1 ggreer staff 9666560 Nov 22 11:16 db
| } | ||
|
|
||
| type Edge struct { | ||
| EdgeID int `json:"edge_id"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we used these previously as a effective pagination token, does this mean ideally for in-place upgades we throw away "old" pagination state, and start over? should "version" the pagination token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure we've broken this before. In an ideal world we'd version this stuff and write migrations, but I think the moneyball solution would be to end the sync if we error unmarshaling the json. I pushed a change that does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don;'t think we need a migration, we just need to start the specified sync step over from zero
Improve syncing and grant expansion performance.
After these changes, we spend almost all of our time saving grants instead of checkpointing.
Summary by CodeRabbit
New Features
HasNoCyclesproperty to streamline cycle detection in the entitlement graph.Bug Fixes
Refactor
LoadC1Zmethod to include additional parameters for improved functionality.Tests