-
-
Notifications
You must be signed in to change notification settings - Fork 16
support gitignore filter #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
59b9ed9 to
23bf339
Compare
0e885fc to
c0c48df
Compare
b358c93 to
f9beb47
Compare
85952e8 to
0a4d869
Compare
408a2d7 to
9e01903
Compare
|
/gemini help |
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback. |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for .gitignore style filtering, which is a valuable addition. The implementation includes a state machine for pattern matching and a parser for ignore rules. The code is generally well-structured.
I've found a couple of areas that need attention, including a critical issue in the parser logic and a medium-severity issue in how directory-specific rules are checked. Addressing these will improve the correctness and robustness of the new feature.
Summary of Findings
- Parser Logic Error: A critical error in
IgnoreParser.parse(lines 151-163) causes double processing of the first path component in certain patterns, leading to incorrect ignore rule state machines. - Directory-Specific Rule Handling: In
IgnoreRule.check(lines 88-90), directory-specific rules might incorrectly ignore or exclude non-directory entries instead of returning.None.
Merge Readiness
This pull request makes good progress on adding .gitignore support. However, due to the critical parsing issue identified (and a medium-severity logic concern), I recommend that these changes be addressed before merging. Once these issues are resolved, the feature will be much more robust. I am not authorized to approve pull requests, so please ensure further review and approval after the changes are made.
| var first_item = it.first(); | ||
| if (!std.mem.eql(u8, "", first_item)) { | ||
| try rule.pushState(State{ .anything = true }); | ||
| try rule.pushState(State{ .exact = first_item }); | ||
| } | ||
|
|
||
| while (it.next()) |item| { | ||
| if (std.mem.eql(u8, "**", item)) { | ||
| try rule.pushState(State{ .anything = true }); | ||
| } else { | ||
| try rule.pushState(State{ .exact = item }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a critical issue in the parsing logic for rule components. The current approach using it.first() (line 151) to peek at the first component, potentially processing it in lines 153-154, and then iterating with while (it.next()) (line 157) can lead to the first component being processed twice. This is because it.first() does not advance the iterator.
For example, with a pattern like "a/b":
input[start..end]would be"a/b".first_item(line 151) would be"a".- The
ifcondition on line 152 would be true. - Lines 153-154 would push
State{ .anything = true }and thenState{ .exact = "a" }. - The
while (it.next())loop on line 157 would then again process"a"as its first item, followed by"b".
This would result in an incorrect state machine like[any, a, a, b]instead of the expected[any, a, b]for a pattern like"a/b"(which implies**/a/b).
This could lead to incorrect matching behavior. Consider refactoring this section to correctly handle the implicit ** for non-anchored patterns and then iterate through all components once.
A revised approach might involve:
- Checking if
input[start..end]starts with/. - If not (and the pattern is not empty), add the initial
State{ .anything = true }. - Then, use a single
while (it.next())loop to process all components, ensuring to skip any empty components that arise (e.g., from a leading/or//).
// Corrected logic to avoid double-processing of the first component.
const current_pattern_slice = input[start..end];
if (current_pattern_slice.len > 0 && !std.mem.startsWith(u8, current_pattern_slice, "/")) {
// Pattern is not anchored and not empty, so prepend 'anything' (like '**')
try rule.pushState(State{ .anything = true });
}
// 'it' is already initialized from input[start..end] on line 150.
// Now, iterate through its components.
while (it.next()) |component| {
if (component.len == 0) {
// Skip empty components. These arise from:
// - A leading slash (e.g., "/foo" -> "" then "foo"). The "" is skipped.
// - Consecutive slashes (e.g., "foo//bar" -> "foo", "", "bar"). The "" is skipped.
continue;
}
if (std.mem.eql(u8, "**", component)) {
// Add 'anything' state, but avoid duplicates if the last state was also 'anything'.
if (rule.state_machine.items.len == 0 or rule.state_machine.items[rule.state_machine.items.len - 1] != .{.anything = true}) {
try rule.pushState(State{ .anything = true });
}
} else {
try rule.pushState(State{ .exact = component });
}
}
| if (self.is_dir and file_entry.kind != .directory) { | ||
| return if (self.is_exclude) .Exclude else .Ignore; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for directory-specific rules seems to have an issue. If a rule is specific to directories (i.e., self.is_dir is true, often from a pattern ending in /), and the file_entry being checked is not a directory, this rule should simply not apply to the entry. In such a case, the function should return .None.
Currently, if self.is_dir is true and file_entry.kind != .directory, it returns .Ignore (or .Exclude if self.is_exclude is true). This could lead to incorrectly ignoring or excluding files.
For example, if the rule is build/ (ignore the build directory) and we are checking a file named other_file.txt, this rule should not cause other_file.txt to be ignored. However, the current logic would return .Ignore.
Could you clarify if this behavior is intended, or if it should return .None when a directory-specific rule encounters a non-directory entry?
// If the rule is for a directory, but the entry is not a directory,
// this rule does not apply to this entry.
return .None;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for gitignore-style filtering by introducing a state-machine-based path matcher and a parser for ignore rules.
- Implements
match_iterfor matching path segments against patterns including**wildcards, with unit tests. - Adds
IgnoreRuleto represent parsed ignore/exclude rules and acheckmethod to apply them to file entries. - Introduces
IgnoreParserto parse gitignore syntax intoIgnoreRuleinstances, with parser tests.
Comments suppressed due to low confidence (2)
src/walk_dir.zig:9
- [nitpick] The alias
PathIteris declared but never used; consider removing it to reduce dead code.
const PathIter = mem.SplitIterator(u8, .sequence);
src/walk_dir.zig:87
- [nitpick] The
checkmethod ofIgnoreRuleis not covered by any unit tests. Consider adding tests for both.Ignoreand.Excludescenarios on files and directories to ensure the matching logic works as intended.
fn check(self: Self, path: []const u8, file_entry: fs.IterableDir.Entry) !CheckResult {
|
|
||
| const remainings = mem.trimLeft(u8, path, self.dir); | ||
| var path_iter = mem.splitSequence(u8, remainings, "/"); | ||
| var paths = std.ArrayList([]const u8); |
Copilot
AI
May 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paths ArrayList is declared without calling .init(allocator) and is never deinitialized, leading to a panic on append and a memory leak. Initialize it with the allocator (e.g., std.ArrayList([]const u8).init(self.allocator)) and defer paths.deinit().
| var paths = std.ArrayList([]const u8); | |
| var paths = std.ArrayList([]const u8).init(self.state_machine.allocator); | |
| defer paths.deinit(); |
No description provided.