-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[flake8-bugbear] Ignore non-NFKC attribute names in B009 and B010 (B009, B010)
#21131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
| // Ignore non-NFKC attribute names. Python normalizes identifiers using NFKC, so using | ||
| // attribute syntax (e.g., `obj.attr`) would normalize the name and potentially change | ||
| // program behavior. | ||
| let attr_name = value.to_str(); | ||
| if attr_name.nfkc().collect::<String>() != attr_name { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this after line 78 as this comparison is more expensive than testing if it is the builtin getattr
| setattr(foo, "__debug__", 0) | ||
|
|
||
| # Regression test for: https://github.com/astral-sh/ruff/issues/21126 | ||
| # Non-NFKC attribute names should be ignored (e.g., "ſ" normalizes to "s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest including a short explanation why they should be ignored
Moved the builtin function check earlier in both `getattr_with_constant` and `setattr_with_constant` to avoid unnecessary processing. Updated test fixture comments to clarify NFKC normalization behavior and its impact on attribute access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards marking the fix as unsafe if the attribute is different after nfkc normalization instead of omitting the diagnostic entirely as I doubt that it's very intentional.
The counter-argument to this is that someone might deliberately use getattr because they're aware of the difference and it matters to them. The synta then acts as an explicit expression of their intent and forcing them to add a noqa comment feels redundant.
I'm leaning towards doing the former because the second seems extremely rare and a noqa suppression (with an explanation why) can serve as additional documentation for future readers. But I'm curious to hear what others think
|
+1 to just marking it unsafe |
Updates the flake8-bugbear rules for `getattr` and `setattr` with constant attributes to mark fixes as unsafe when the attribute name is not NFKC-normalized. This prevents silent behavior changes when rewriting to attribute access, and updates tests and documentation accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This looks good. The only thing left to do is to update the rule documentation with an explanation when the fix is marked as unsafe
Added detailed explanations to the doc comments of `getattr_with_constant` and `setattr_with_constant` rules about the unsafe nature of fixes when attribute names are not in NFKC normalization. This clarifies that Python normalizes identifiers in attribute access syntax but not in string arguments, which can lead to behavioral changes if fixes are applied to non-NFKC names.
Summary
This PR updates the
B009(get-attr-with-constant) andB010(set-attr-with-constant) rules to ignore non-NFKC attribute names, preventing refactoring suggestions that could change program behavior.Fixes #21126
Problem Analysis
Python normalizes identifiers using NFKC (Normalization Form KC) normalization. When using attribute access syntax (e.g.,
obj.attr), Python automatically normalizes the identifier. However, when usinggetattrorsetattrwith a string literal, the identifier is not normalized.The issue occurs when an attribute name contains characters that normalize differently under NFKC. For example:
"ſ"normalizes to"s"in NFKCsetattr(ns, "ſ", 1)creates a distinct attribute fromns.ssetattr(ns, "ſ", 1)withns.ſ = 1, Python would normalizens.ſtons.s, which changes the program's behaviorThe bug was that Ruff's B009 and B010 rules didn't account for this normalization difference, potentially suggesting unsafe refactorings.
Approach
The fix adds a check in both
B009andB010rule implementations to detect non-NFKC attribute names. Before suggesting a refactoring, the code now:This ensures that:
Test cases were added to verify that non-NFKC attribute names (like
"ſ") are correctly ignored by both rules.