Skip to content

Conversation

@ntBre
Copy link
Contributor

@ntBre ntBre commented Jul 24, 2025

Summary

This was previously the last commit in #19415, split out to make it easier to review. This applies the fixes from c9b99e4, 5021f32, and 2922490 to the new rendering code in ruff_db. I initially intended only to fix the empty span after a line terminator (as you can see in the branch name), but the two fixes were tied pretty closely together, and my initial fix for the empty spans needed a big change after trying to handle unprintable characters too. I can still split this up if it would help with review. I would just start with the unprintable characters first.

The implementation here is essentially copy-pasted from ruff_linter::message::text.rs, with the SourceCode struct renamed to EscapedSourceCode since there's already a SourceCode in scope in render.rs. It's also updated slightly to account for the multiple annotations for a single snippet. The original implementation used some types from the line_width module from ruff_linter. I copied over heavily stripped-down versions of these instead of trying to import them. We could inline the remaining code entirely, if we want, but I thought it was nice enough to keep.

I also moved over ceil_char_boundary, which is unchanged except to make it a free function taking a &str instead of a Locator method. All of this code could be deleted from ruff_linter if we also move over the grouped output format, which will be the last user after #19415.

Test Plan

I added new tests in ruff_linter that call into the new rendering code to snapshot the diagnostics for the affected cases. These are copies of existing snapshots in Ruff, so it's helpful to compare them. These are a bit noisy because of the other rendering differences in the header, but all of the ^^^ indicators should be the same.

`empty_span_after_line_terminator` diff
diff --git a/crates/ruff_linter/src/rules/pycodestyle/snapshots/ruff_linter__rules__pycodestyle__tests__E112_E11.py.snap b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__empty_span_after_line_terminator.snap
index 5ade4346e0..6df75c16f0 100644
--- a/crates/ruff_linter/src/rules/pycodestyle/snapshots/ruff_linter__rules__pycodestyle__tests__E112_E11.py.snap
+++ b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__empty_span_after_line_terminator.snap
@@ -1,17 +1,20 @@
 ---
-source: crates/ruff_linter/src/rules/pycodestyle/mod.rs
+source: crates/ruff_linter/src/message/text.rs
+expression: value.to_string()
 ---
-E11.py:9:1: E112 Expected an indented block
+error[no-indented-block]: Expected an indented block
+  --> E11.py:9:1
    |
  7 | #: E112
  8 | if False:
  9 | print()
-   | ^ E112
+   | ^
 10 | #: E113
 11 | print()
    |
 
-E11.py:9:1: SyntaxError: Expected an indented block after `if` statement
+error[invalid-syntax]: SyntaxError: Expected an indented block after `if` statement
+  --> E11.py:9:1
    |
  7 | #: E112
  8 | if False:
@@ -21,7 +24,8 @@ E11.py:9:1: SyntaxError: Expected an indented block after `if` statement
 11 | print()
    |
 
-E11.py:12:1: SyntaxError: Unexpected indentation
+error[invalid-syntax]: SyntaxError: Unexpected indentation
+  --> E11.py:12:1
    |
 10 | #: E113
 11 | print()
@@ -31,7 +35,8 @@ E11.py:12:1: SyntaxError: Unexpected indentation
 14 | mimetype = 'application/x-directory'
    |
 
-E11.py:14:1: SyntaxError: Expected a statement
+error[invalid-syntax]: SyntaxError: Expected a statement
+  --> E11.py:14:1
    |
 12 |     print()
 13 | #: E114 E116
@@ -41,17 +46,19 @@ E11.py:14:1: SyntaxError: Expected a statement
 16 | create_date = False
    |
 
-E11.py:45:1: E112 Expected an indented block
+error[no-indented-block]: Expected an indented block
+  --> E11.py:45:1
    |
 43 | #: E112
 44 | if False:  #
 45 | print()
-   | ^ E112
+   | ^
 46 | #:
 47 | if False:
    |
 
-E11.py:45:1: SyntaxError: Expected an indented block after `if` statement
+error[invalid-syntax]: SyntaxError: Expected an indented block after `if` statement
+  --> E11.py:45:1
    |
 43 | #: E112
 44 | if False:  #
`unprintable_characters` diff
diff --git a/crates/ruff_linter/src/rules/pylint/snapshots/ruff_linter__rules__pylint__tests__PLE2512_invalid_characters.py.snap b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__unprintable_characters.snap
index 52cfdf9cce..fcfa1ac9f1 100644
--- a/crates/ruff_linter/src/rules/pylint/snapshots/ruff_linter__rules__pylint__tests__PLE2512_invalid_characters.py.snap
+++ b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__unprintable_characters.snap
@@ -1,161 +1,115 @@
 ---
-source: crates/ruff_linter/src/rules/pylint/mod.rs
+source: crates/ruff_linter/src/message/text.rs
+expression: value.to_string()
 ---
-invalid_characters.py:24:12: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:24:12
    |
 22 | cr_ok = f'\\r'
 23 |
 24 | sub = 'sub �'
-   |            ^ PLE2512
+   |            ^
 25 | sub = f'sub �'
    |
-   = help: Replace with escape sequence
+help: Replace with escape sequence
 
-ℹ Safe fix
-21 21 | cr_ok = '\\r'
-22 22 | cr_ok = f'\\r'
-23 23 | 
-24    |-sub = 'sub �'
-   24 |+sub = 'sub \x1A'
-25 25 | sub = f'sub �'
-26 26 | 
-27 27 | sub_ok = '\x1a'
-
-invalid_characters.py:25:13: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:25:13
    |
 24 | sub = 'sub �'
 25 | sub = f'sub �'
-   |             ^ PLE2512
+   |             ^
 26 |
 27 | sub_ok = '\x1a'
    |
-   = help: Replace with escape sequence
-
-ℹ Safe fix
-22 22 | cr_ok = f'\\r'
-23 23 | 
-24 24 | sub = 'sub �'
-25    |-sub = f'sub �'
-   25 |+sub = f'sub \x1A'
-26 26 | 
-27 27 | sub_ok = '\x1a'
-28 28 | sub_ok = f'\x1a'
+help: Replace with escape sequence
 
-invalid_characters.py:55:25: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:55:25
    |
 53 | zwsp_after_multicharacter_grapheme_cluster = f"ಫ್ರಾನ್ಸಿಸ್ಕೊ ​​"
 54 |
 55 | nested_fstrings = f'␈{f'�{f'␛'}'}'
-   |                         ^ PLE2512
+   |                         ^
 56 |
 57 | # https://github.com/astral-sh/ruff/issues/7455#issuecomment-1741998106
    |
-   = help: Replace with escape sequence
-
-ℹ Safe fix
-52 52 | zwsp_after_multicharacter_grapheme_cluster = "ಫ್ರಾನ್ಸಿಸ್ಕೊ ​​"
-53 53 | zwsp_after_multicharacter_grapheme_cluster = f"ಫ್ರಾನ್ಸಿಸ್ಕೊ ​​"
-54 54 | 
-55    |-nested_fstrings = f'␈{f'�{f'␛'}'}'
-   55 |+nested_fstrings = f'␈{f'\x1A{f'␛'}'}'
-56 56 | 
-57 57 | # https://github.com/astral-sh/ruff/issues/7455#issuecomment-1741998106
-58 58 | x = f"""}}a�b"""
+help: Replace with escape sequence
 
-invalid_characters.py:58:12: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:58:12
    |
 57 | # https://github.com/astral-sh/ruff/issues/7455#issuecomment-1741998106
 58 | x = f"""}}a�b"""
-   |            ^ PLE2512
+   |            ^
 59 | # https://github.com/astral-sh/ruff/issues/7455#issuecomment-1741998256
 60 | x = f"""}}a␛b"""
    |
-   = help: Replace with escape sequence
+help: Replace with escape sequence
 
-ℹ Safe fix
-55 55 | nested_fstrings = f'␈{f'�{f'␛'}'}'
-56 56 | 
-57 57 | # https://github.com/astral-sh/ruff/issues/7455#issuecomment-1741998106
-58    |-x = f"""}}a�b"""
-   58 |+x = f"""}}a\x1Ab"""
-59 59 | # https://github.com/astral-sh/ruff/issues/7455#issuecomment-1741998256
-60 60 | x = f"""}}a␛b"""
-61 61 | 
-
-invalid_characters.py:64:12: PLE2512 Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:64:12
    |
 63 | # https://github.com/astral-sh/ruff/issues/13294
 64 | print(r"""␈�␛�​
-   |            ^ PLE2512
+   |            ^
 65 | """)
 66 | print(fr"""␈�␛�​
    |
-   = help: Replace with escape sequence
+help: Replace with escape sequence
 
-invalid_characters.py:66:13: PLE2512 Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:66:13
    |
 64 | print(r"""␈�␛�​
 65 | """)
 66 | print(fr"""␈�␛�​
-   |             ^ PLE2512
+   |             ^
 67 | """)
 68 | print(Rf"""␈�␛�​
    |
-   = help: Replace with escape sequence
+help: Replace with escape sequence
 
-invalid_characters.py:68:13: PLE2512 Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:68:13
    |
 66 | print(fr"""␈�␛�​
 67 | """)
 68 | print(Rf"""␈�␛�​
-   |             ^ PLE2512
+   |             ^
 69 | """)
    |
-   = help: Replace with escape sequence
+help: Replace with escape sequence
 
-invalid_characters.py:73:9: PLE2512 Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:73:9
    |
 71 | # https://github.com/astral-sh/ruff/issues/18815
 72 | b = "\␈"
 73 | sub = "\�"
-   |         ^ PLE2512
+   |         ^
 74 | esc = "\␛"
 75 | zwsp = "\​"
    |
-   = help: Replace with escape sequence
+help: Replace with escape sequence
 
-invalid_characters.py:80:25: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:80:25
    |
 78 | # tstrings
 79 | esc = t'esc esc ␛'
 80 | nested_tstrings = t'␈{t'�{t'␛'}'}'
-   |                         ^ PLE2512
+   |                         ^
 81 | nested_ftstrings = t'␈{f'�{t'␛'}'}'
    |
-   = help: Replace with escape sequence
-
-ℹ Safe fix
-77 77 | 
-78 78 | # tstrings
-79 79 | esc = t'esc esc ␛'
-80    |-nested_tstrings = t'␈{t'�{t'␛'}'}'
-   80 |+nested_tstrings = t'␈{t'\x1A{t'␛'}'}'
-81 81 | nested_ftstrings = t'␈{f'�{t'␛'}'}'
-82 82 | 
+help: Replace with escape sequence
 
-invalid_characters.py:81:26: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead
+error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead
+  --> invalid_characters.py:81:26
    |
 79 | esc = t'esc esc ␛'
 80 | nested_tstrings = t'␈{t'�{t'␛'}'}'
 81 | nested_ftstrings = t'␈{f'�{t'␛'}'}'
-   |                          ^ PLE2512
+   |                          ^
    |
-   = help: Replace with escape sequence
-
-ℹ Safe fix
-78 78 | # tstrings
-79 79 | esc = t'esc esc ␛'
-80 80 | nested_tstrings = t'␈{t'�{t'␛'}'}'
-81    |-nested_ftstrings = t'␈{f'�{t'␛'}'}'
-   81 |+nested_ftstrings = t'␈{f'\x1A{t'␛'}'}'
-82 82 |
+help: Replace with escape sequence

@ntBre ntBre added internal An internal refactor or improvement diagnostics Related to reporting of diagnostics. labels Jul 24, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 24, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@ntBre
Copy link
Contributor Author

ntBre commented Jul 24, 2025

After looking at this again, I think we can probably do both this check and the unprintable check in RenderableSnippet::new as I mentioned in my other comment. I think I'll try that before taking this out of draft.

@ntBre ntBre force-pushed the brent/empty-span-after-line-terminator branch from d0c9194 to b9d27c6 Compare July 25, 2025 15:47
@ntBre ntBre force-pushed the brent/empty-span-after-line-terminator branch from b9d27c6 to 665909e Compare July 25, 2025 18:35
@github-actions
Copy link
Contributor

github-actions bot commented Jul 25, 2025

mypy_primer results

No ecosystem changes detected ✅
No memory usage changes detected ✅

@ntBre ntBre changed the title Fix empty spans following a line terminator in ruff_db Fix empty spans following a line terminator and unprintable character spans in ruff_db Jul 25, 2025
@ntBre ntBre marked this pull request as ready for review July 25, 2025 19:47
@ntBre ntBre requested a review from BurntSushi July 25, 2025 19:47
@MichaReiser MichaReiser added the ty Multi-file analysis & type inference label Jul 26, 2025
@MichaReiser MichaReiser changed the title Fix empty spans following a line terminator and unprintable character spans in ruff_db Fix empty spans following a line terminator and unprintable character spans in Diagnostics Jul 26, 2025
@MichaReiser MichaReiser changed the title Fix empty spans following a line terminator and unprintable character spans in Diagnostics Fix empty spans following a line terminator and unprintable character spans in diagnostics Jul 26, 2025
Comment on lines 470 to 480
fn empty_span_after_line_terminator() -> anyhow::Result<()> {
let path = Path::new("pycodestyle").join("E11.py");
let settings = LinterSettings::for_rule(Rule::NoIndentedBlock);
let diagnostics = test_path(path, &settings)?;
let config = DisplayDiagnosticConfig::default().format(DiagnosticFormat::Full);
let notebook_indexes = FxHashMap::default();
let context = EmitterContext::new(&notebook_indexes);
let value = DisplayDiagnostics::new(&context, &config, &diagnostics);
insta::assert_snapshot!(value.to_string());
Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these be tests in render::tests instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very easily. For the tests in ruff_db we have to synthesize diagnostics:

env.builder("unused-import", Severity::Error, "`os` imported but unused")
.primary("fib.py", "1:7", "1:9", "")
.help("Remove unused import: `os`")
.secondary_code("F401")
.fix(Fix::unsafe_edit(Edit::range_deletion(TextRange::new(
TextSize::from(0),
TextSize::from(10),
))))
.noqa_offset(TextSize::from(7))
.build(),

I started out doing that and then realized it would be much easier to use Ruff's actual infrastructure to get the diagnostics and not accidentally test the wrong thing.

It should be safe to delete these tests once Ruff uses the new renderer in general anyway, since they'll be duplicates of existing snapshots.

Copy link
Member

@MichaReiser MichaReiser Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's worth having dedicated tests for this or we'll loose the assertion if the rule changes or gets removed. I'm aware that it requires building the diagnostics manually but I'd expect that ir isn't too hard. It only requires calling a fewer diagnostic builder methods (and it's definitely easier to debug). But maybe I'm missing something?

(Looking at the test. The builder code is roughly as much code as what you have here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the builder code for one diagnostic, whereas the current test has 6 diagnostics in this case and 10 diagnostics for unprintable_characters. I'll strip down the two interesting cases and move them to ruff_db. I thought it was nice to have the full snapshot for diffing against Ruff, but now that that's resolved we can just keep minimized versions of the tests.

I'm still a bit wary of having to manually input the ranges in the builder, which could also fall out of sync with real diagnostics, but I see the benefits otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that wasn't so bad: a4f7434. I also applied the patch to main and checked that they failed there as expected.

The stripped down versions are also nice because they fit better as inline snapshots.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit wary of having to manually input the ranges in the builder

What I like to do in those cases is to use str.find to get the offset

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I only have a few nit comments. The only thing I would change is to make these unit tests in ruff db. We do have the infrastructure to write those (I think?)

Comment on lines 891 to 892
for (ann, &original_range) in annotations.iter_mut().zip(&original_ranges) {
if index < usize::from(original_range.start()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand why we need to collect the ranges first, only to then zip then here again. Can't we call ann.range() inside the loop?

for ann in annotations.iter_mut() {
	let original_range = ann.range();
  if index < usize::from(original_range.start()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the initial range from the start of the function. I ran into a double-counting issue yesterday when I was using the current ann.range throughout (I was ending up with a range of 64-65 instead of the expected 62-63, starting from 60-61). I think capturing the initial range should match the other version of this code, which saves the input annotation_range and mutates the range copy:

fn replace_whitespace_and_unprintable(source: &str, annotation_range: TextRange) -> SourceCode {
let mut result = String::new();
let mut last_end = 0;
let mut range = annotation_range;
let mut line_width = LineWidthBuilder::new(IndentWidth::default());

though it pained me to collect here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that we didn't have a test for this in the stripped-down version, so I just added another test case. Without using original_ranges, this input:

���  # rendered as ^H^Z^[ in my editor

causes

    thread 'diagnostic::render::full::tests::multiple_unprintable_characters' panicked at crates/ruff_annotate_snippets/src/renderer/display_list.rs:1450:29:
    byte index 5 is not a char boundary; it is inside '␛' (bytes 4..7) of `␈␛`

From my earlier debugging, the issue was some kind of double counting in replace_whitespace_and_unprintable. The input annotation range here is 1..1, and the output in this panicking version is 5..5, instead of the correct 3..3, so it gets shifted twice if you compare against the current range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I see now. This is a callback function. That's why it matters.

It seems unfortunate that we have to collect all original ranges even if the source code doesn't contain a single tab character (which should be the most common case).

Would it make sense to change the for (index, c) in source.char_indices() { to track the relative offset compared to the source offset. This way, you could call update_ranges with index + relative_offset and remove the original ranges thing entirely

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this: I think the relative offset is already known. It's result.text_len() - source.text_len(). That's how many characters where inserted by this function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is likely empty until the very end of the function because we try to return a Cow::Borrowed(source) if nothing is modified, so I don't think we can get the offset from there.

It seems we update result everytime before we call update_ranges. Which makes sense to me. We update the source text and that, in return, requires updating the annotations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could just do a regex search for tab and unprintable characters (looks like 5 distinct bytes?). If none are found, then you can just quit early with the Cow::Borrowed case. This search won't need to do UTF-8 decoding and regex might even vectorize it, so it could save you from allocating and UTF-8 decoding.

With all that said, even without that search to bail early, the up-front alloc here is almost certainly marginal. We are in the rendering code here, which does all sorts of allocs (including in annotate-snippets).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah maybe I'm misunderstanding the suggestion. In my naive attempt I tried result.text_len() - source.text_len() and kept getting underflow errors because we're building up the result over time, but source is always quite long from the beginning. Did you mean some sub-slice of source? Or maybe I injected this at the wrong point in the function.

What I meant to say above is "result doesn't have a comparable length to source until the very end of the function," not that it's totally empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. In that case, isn't the relative offset result.text_len() - i?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, thanks! I had just missed accounting for the width of the character being inserted when I tried this initially.

let mut result = String::new();
let mut last_end = 0;
let original_ranges: Vec<TextRange> = annotations.iter().map(|ann| ann.range).collect();
let mut line_width = LineWidthBuilder::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Move the declaration of this variable closer to where it's used.

Same for last_end. It helps readability because, as a reader, I have to page less variables into my memory :)

Comment on lines 916 to 932
if matches!(c, '\t') {
let tab_width = u32::try_from(line_width.get() - old_width)
.expect("small width because of tab size");
result.push_str(&source[last_end..index]);
for _ in 0..tab_width {
result.push(' ');
}
last_end = index + 1;
update_ranges(index, tab_width);
} else if let Some(printable) = unprintable_replacement(c) {
result.push_str(&source[last_end..index]);
result.push(printable);
last_end = index + 1;

let len = printable.text_len().to_u32();
update_ranges(index, len);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have to match on tabs here anyway, do you think the LineWidthBuilder is worth it or would it simplify the code if it were inlined instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does look nicer inlined now that I tried it, thanks! You can see the whole looping logic on one screen now.

I don't see a great way to combine the two matches, but I still think it's an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see how to combine them. Obviously \n and \r won't hit the unprintable branch 🤦

{
continue;
}
if self.text.as_bytes()[range.start().to_usize() - 1] != b'\n' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a pre-existing issue but we should also handle \r here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would be okay in this context because checking \n as the preceding character should cover both \n and \r\n line endings, but it's easy enough to add \r either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. But Ruff still supports \r (because python does)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

@MichaReiser MichaReiser removed the internal An internal refactor or improvement label Jul 26, 2025
@MichaReiser MichaReiser changed the title Fix empty spans following a line terminator and unprintable character spans in diagnostics [ty] Fix empty spans following a line terminator and unprintable character spans in diagnostics Jul 26, 2025
Comment on lines 891 to 892
for (ann, &original_range) in annotations.iter_mut().zip(&original_ranges) {
if index < usize::from(original_range.start()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could just do a regex search for tab and unprintable characters (looks like 5 distinct bytes?). If none are found, then you can just quit early with the Cow::Borrowed case. This search won't need to do UTF-8 decoding and regex might even vectorize it, so it could save you from allocating and UTF-8 decoding.

With all that said, even without that search to bail early, the up-front alloc here is almost certainly marginal. We are in the rendering code here, which does all sorts of allocs (including in annotate-snippets).

We can just track this as the offset between the original source length and the
current or `result` length. This offset is given by:

```rust
let offset = result.text_len().to_usize() - index;
```

which when added to the original `update_ranges` call:

```rust
update_ranges(index + offset, tab_width);
```

simplifies to just `result.text_len()` (`index` is added and then subtracted).

The other slight nuance here is that in the original `update_ranges` call
locations, we would also need to subtract the length of the new character from
the `index` argument, so I instead opted just to move the calls to before we
added the new character. This makes `result.text_len()` alone exactly the value
we need.
@github-actions
Copy link
Contributor

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@ntBre ntBre merged commit a54061e into main Jul 29, 2025
38 checks passed
@ntBre ntBre deleted the brent/empty-span-after-line-terminator branch July 29, 2025 12:26
ntBre added a commit that referenced this pull request Jul 30, 2025
ntBre added a commit that referenced this pull request Jul 30, 2025
dcreager added a commit that referenced this pull request Aug 1, 2025
* main: (24 commits)
  Add `Checker::context` method, deduplicate Unicode checks (#19609)
  [`flake8-pyi`] Preserve inline comment in ellipsis removal (`PYI013`) (#19399)
  [ty] Add flow diagram for import resolution
  [ty] Add comments to some core resolver functions
  [ty] Add missing ticks and use consistent quoting
  [ty] Reflow some long lines
  [ty] Unexport helper function
  [ty] Remove offset from `CompletionTargetTokens::Unknown`
  [`pyupgrade`] Fix `UP030` to avoid modifying double curly braces in format strings (#19378)
  [ty] fix a typo  (#19621)
  [ty] synthesize `__replace__` for dataclasses (>=3.13) (#19545)
  [ty] Discard `Definition`s when normalizing `Signature`s (#19615)
  [ty] Fix empty spans following a line terminator and unprintable character spans in diagnostics (#19535)
  Add `LinterContext::settings` to avoid passing separate settings (#19608)
  Support `.pyi` files in ruff analyze graph (#19611)
  [ty] Sync vendored typeshed stubs (#19607)
  [ty] Bump docstring-adder pin (#19606)
  [`perflint`] Ignore rule if target is `global` or `nonlocal` (`PERF401`) (#19539)
  Add license classifier back to pyproject.toml (#19599)
  [ty] Add stub mapping support to signature help (#19570)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diagnostics Related to reporting of diagnostics. ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants