Skip to content

Conversation

@danparizher
Copy link
Contributor

Summary

This PR updates the B009 (get-attr-with-constant) and B010 (set-attr-with-constant) rules to ignore non-NFKC attribute names, preventing refactoring suggestions that could change program behavior.

Fixes #21126

Problem Analysis

Python normalizes identifiers using NFKC (Normalization Form KC) normalization. When using attribute access syntax (e.g., obj.attr), Python automatically normalizes the identifier. However, when using getattr or setattr with a string literal, the identifier is not normalized.

The issue occurs when an attribute name contains characters that normalize differently under NFKC. For example:

  • The long s character "ſ" normalizes to "s" in NFKC
  • Using setattr(ns, "ſ", 1) creates a distinct attribute from ns.s
  • If Ruff suggests replacing setattr(ns, "ſ", 1) with ns.ſ = 1, Python would normalize ns.ſ to ns.s, which changes the program's behavior

The bug was that Ruff's B009 and B010 rules didn't account for this normalization difference, potentially suggesting unsafe refactorings.

Approach

The fix adds a check in both B009 and B010 rule implementations to detect non-NFKC attribute names. Before suggesting a refactoring, the code now:

  1. Normalizes the attribute name using NFKC
  2. Compares the normalized form with the original
  3. If they differ, the rule skips the check (returns early) to avoid suggesting an unsafe refactoring

This ensures that:

  • Rules only suggest refactorings when it's safe (i.e., when NFKC normalization wouldn't change the attribute name)
  • The fix is minimal and doesn't affect other rule functionality
  • Edge cases are handled correctly (non-NFKC names are silently ignored)

Test cases were added to verify that non-NFKC attribute names (like "ſ") are correctly ignored by both rules.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Comment on lines 69 to 75
// Ignore non-NFKC attribute names. Python normalizes identifiers using NFKC, so using
// attribute syntax (e.g., `obj.attr`) would normalize the name and potentially change
// program behavior.
let attr_name = value.to_str();
if attr_name.nfkc().collect::<String>() != attr_name {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this after line 78 as this comparison is more expensive than testing if it is the builtin getattr

setattr(foo, "__debug__", 0)

# Regression test for: https://github.com/astral-sh/ruff/issues/21126
# Non-NFKC attribute names should be ignored (e.g., "ſ" normalizes to "s")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest including a short explanation why they should be ignored

Moved the builtin function check earlier in both `getattr_with_constant` and `setattr_with_constant` to avoid unnecessary processing. Updated test fixture comments to clarify NFKC normalization behavior and its impact on attribute access.
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards marking the fix as unsafe if the attribute is different after nfkc normalization instead of omitting the diagnostic entirely as I doubt that it's very intentional.

The counter-argument to this is that someone might deliberately use getattr because they're aware of the difference and it matters to them. The synta then acts as an explicit expression of their intent and forcing them to add a noqa comment feels redundant.

I'm leaning towards doing the former because the second seems extremely rare and a noqa suppression (with an explanation why) can serve as additional documentation for future readers. But I'm curious to hear what others think

@amyreese
Copy link
Contributor

+1 to just marking it unsafe

Updates the flake8-bugbear rules for `getattr` and `setattr` with constant attributes to mark fixes as unsafe when the attribute name is not NFKC-normalized. This prevents silent behavior changes when rewriting to attribute access, and updates tests and documentation accordingly.
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This looks good. The only thing left to do is to update the rule documentation with an explanation when the fix is marked as unsafe

Added detailed explanations to the doc comments of `getattr_with_constant` and `setattr_with_constant` rules about the unsafe nature of fixes when attribute names are not in NFKC normalization. This clarifies that Python normalizes identifiers in attribute access syntax but not in string arguments, which can lead to behavioral changes if fixes are applied to non-NFKC names.
@MichaReiser MichaReiser added bug Something isn't working fixes Related to suggested fixes for violations labels Nov 3, 2025
@MichaReiser MichaReiser merged commit 6ddfb51 into astral-sh:main Nov 3, 2025
39 checks passed
@danparizher danparizher deleted the fix-21126 branch November 3, 2025 15:07
ibraheemdev pushed a commit that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fixes Related to suggested fixes for violations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

B009 and B010 should ignore non-NFKC attribute names

3 participants