Skip to content

Commit a54061e

Browse files
authored
[ty] Fix empty spans following a line terminator and unprintable character spans in diagnostics (#19535)
## Summary This was previously the last commit in #19415, split out to make it easier to review. This applies the fixes from c9b99e4, 5021f32, and 2922490 to the new rendering code in `ruff_db`. I initially intended only to fix the empty span after a line terminator (as you can see in the branch name), but the two fixes were tied pretty closely together, and my initial fix for the empty spans needed a big change after trying to handle unprintable characters too. I can still split this up if it would help with review. I would just start with the unprintable characters first. The implementation here is essentially copy-pasted from `ruff_linter::message::text.rs`, with the `SourceCode` struct renamed to `EscapedSourceCode` since there's already a `SourceCode` in scope in `render.rs`. It's also updated slightly to account for the multiple annotations for a single snippet. The original implementation used some types from the `line_width` module from `ruff_linter`. I copied over heavily stripped-down versions of these instead of trying to import them. We could inline the remaining code entirely, if we want, but I thought it was nice enough to keep. I also moved over `ceil_char_boundary`, which is unchanged except to make it a free function taking a `&str` instead of a `Locator` method. All of this code could be deleted from `ruff_linter` if we also move over the `grouped` output format, which will be the last user after #19415. ## Test Plan I added new tests in `ruff_linter` that call into the new rendering code to snapshot the diagnostics for the affected cases. These are copies of existing snapshots in Ruff, so it's helpful to compare them. These are a bit noisy because of the other rendering differences in the header, but all of the `^^^` indicators should be the same. <details><summary>`empty_span_after_line_terminator` diff</summary> ```diff diff --git a/crates/ruff_linter/src/rules/pycodestyle/snapshots/ruff_linter__rules__pycodestyle__tests__E112_E11.py.snap b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__empty_span_after_line_terminator.snap index 5ade434..6df75c1 100644 --- a/crates/ruff_linter/src/rules/pycodestyle/snapshots/ruff_linter__rules__pycodestyle__tests__E112_E11.py.snap +++ b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__empty_span_after_line_terminator.snap @@ -1,17 +1,20 @@ --- -source: crates/ruff_linter/src/rules/pycodestyle/mod.rs +source: crates/ruff_linter/src/message/text.rs +expression: value.to_string() --- -E11.py:9:1: E112 Expected an indented block +error[no-indented-block]: Expected an indented block + --> E11.py:9:1 | 7 | #: E112 8 | if False: 9 | print() - | ^ E112 + | ^ 10 | #: E113 11 | print() | -E11.py:9:1: SyntaxError: Expected an indented block after `if` statement +error[invalid-syntax]: SyntaxError: Expected an indented block after `if` statement + --> E11.py:9:1 | 7 | #: E112 8 | if False: @@ -21,7 +24,8 @@ E11.py:9:1: SyntaxError: Expected an indented block after `if` statement 11 | print() | -E11.py:12:1: SyntaxError: Unexpected indentation +error[invalid-syntax]: SyntaxError: Unexpected indentation + --> E11.py:12:1 | 10 | #: E113 11 | print() @@ -31,7 +35,8 @@ E11.py:12:1: SyntaxError: Unexpected indentation 14 | mimetype = 'application/x-directory' | -E11.py:14:1: SyntaxError: Expected a statement +error[invalid-syntax]: SyntaxError: Expected a statement + --> E11.py:14:1 | 12 | print() 13 | #: E114 E116 @@ -41,17 +46,19 @@ E11.py:14:1: SyntaxError: Expected a statement 16 | create_date = False | -E11.py:45:1: E112 Expected an indented block +error[no-indented-block]: Expected an indented block + --> E11.py:45:1 | 43 | #: E112 44 | if False: # 45 | print() - | ^ E112 + | ^ 46 | #: 47 | if False: | -E11.py:45:1: SyntaxError: Expected an indented block after `if` statement +error[invalid-syntax]: SyntaxError: Expected an indented block after `if` statement + --> E11.py:45:1 | 43 | #: E112 44 | if False: # ``` </details> <details><summary>`unprintable_characters` diff</summary> ```diff diff --git a/crates/ruff_linter/src/rules/pylint/snapshots/ruff_linter__rules__pylint__tests__PLE2512_invalid_characters.py.snap b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__unprintable_characters.snap index 52cfdf9..fcfa1ac 100644 --- a/crates/ruff_linter/src/rules/pylint/snapshots/ruff_linter__rules__pylint__tests__PLE2512_invalid_characters.py.snap +++ b/crates/ruff_linter/src/message/snapshots/ruff_linter__message__text__tests__unprintable_characters.snap @@ -1,161 +1,115 @@ --- -source: crates/ruff_linter/src/rules/pylint/mod.rs +source: crates/ruff_linter/src/message/text.rs +expression: value.to_string() --- -invalid_characters.py:24:12: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:24:12 | 22 | cr_ok = f'\\r' 23 | 24 | sub = 'sub �' - | ^ PLE2512 + | ^ 25 | sub = f'sub �' | - = help: Replace with escape sequence +help: Replace with escape sequence -ℹ Safe fix -21 21 | cr_ok = '\\r' -22 22 | cr_ok = f'\\r' -23 23 | -24 |-sub = 'sub �' - 24 |+sub = 'sub \x1A' -25 25 | sub = f'sub �' -26 26 | -27 27 | sub_ok = '\x1a' - -invalid_characters.py:25:13: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:25:13 | 24 | sub = 'sub �' 25 | sub = f'sub �' - | ^ PLE2512 + | ^ 26 | 27 | sub_ok = '\x1a' | - = help: Replace with escape sequence - -ℹ Safe fix -22 22 | cr_ok = f'\\r' -23 23 | -24 24 | sub = 'sub �' -25 |-sub = f'sub �' - 25 |+sub = f'sub \x1A' -26 26 | -27 27 | sub_ok = '\x1a' -28 28 | sub_ok = f'\x1a' +help: Replace with escape sequence -invalid_characters.py:55:25: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:55:25 | 53 | zwsp_after_multicharacter_grapheme_cluster = f"ಫ್ರಾನ್ಸಿಸ್ಕೊ ​​" 54 | 55 | nested_fstrings = f'␈{f'�{f'␛'}'}' - | ^ PLE2512 + | ^ 56 | 57 | # #7455 (comment) | - = help: Replace with escape sequence - -ℹ Safe fix -52 52 | zwsp_after_multicharacter_grapheme_cluster = "ಫ್ರಾನ್ಸಿಸ್ಕೊ ​​" -53 53 | zwsp_after_multicharacter_grapheme_cluster = f"ಫ್ರಾನ್ಸಿಸ್ಕೊ ​​" -54 54 | -55 |-nested_fstrings = f'␈{f'�{f'␛'}'}' - 55 |+nested_fstrings = f'␈{f'\x1A{f'␛'}'}' -56 56 | -57 57 | # #7455 (comment) -58 58 | x = f"""}}a�b""" +help: Replace with escape sequence -invalid_characters.py:58:12: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:58:12 | 57 | # #7455 (comment) 58 | x = f"""}}a�b""" - | ^ PLE2512 + | ^ 59 | # #7455 (comment) 60 | x = f"""}}a␛b""" | - = help: Replace with escape sequence +help: Replace with escape sequence -ℹ Safe fix -55 55 | nested_fstrings = f'␈{f'�{f'␛'}'}' -56 56 | -57 57 | # #7455 (comment) -58 |-x = f"""}}a�b""" - 58 |+x = f"""}}a\x1Ab""" -59 59 | # #7455 (comment) -60 60 | x = f"""}}a␛b""" -61 61 | - -invalid_characters.py:64:12: PLE2512 Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:64:12 | 63 | # #13294 64 | print(r"""␈�␛�​ - | ^ PLE2512 + | ^ 65 | """) 66 | print(fr"""␈�␛�​ | - = help: Replace with escape sequence +help: Replace with escape sequence -invalid_characters.py:66:13: PLE2512 Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:66:13 | 64 | print(r"""␈�␛�​ 65 | """) 66 | print(fr"""␈�␛�​ - | ^ PLE2512 + | ^ 67 | """) 68 | print(Rf"""␈�␛�​ | - = help: Replace with escape sequence +help: Replace with escape sequence -invalid_characters.py:68:13: PLE2512 Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:68:13 | 66 | print(fr"""␈�␛�​ 67 | """) 68 | print(Rf"""␈�␛�​ - | ^ PLE2512 + | ^ 69 | """) | - = help: Replace with escape sequence +help: Replace with escape sequence -invalid_characters.py:73:9: PLE2512 Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:73:9 | 71 | # #18815 72 | b = "\␈" 73 | sub = "\�" - | ^ PLE2512 + | ^ 74 | esc = "\␛" 75 | zwsp = "\​" | - = help: Replace with escape sequence +help: Replace with escape sequence -invalid_characters.py:80:25: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:80:25 | 78 | # tstrings 79 | esc = t'esc esc ␛' 80 | nested_tstrings = t'␈{t'�{t'␛'}'}' - | ^ PLE2512 + | ^ 81 | nested_ftstrings = t'␈{f'�{t'␛'}'}' | - = help: Replace with escape sequence - -ℹ Safe fix -77 77 | -78 78 | # tstrings -79 79 | esc = t'esc esc ␛' -80 |-nested_tstrings = t'␈{t'�{t'␛'}'}' - 80 |+nested_tstrings = t'␈{t'\x1A{t'␛'}'}' -81 81 | nested_ftstrings = t'␈{f'�{t'␛'}'}' -82 82 | +help: Replace with escape sequence -invalid_characters.py:81:26: PLE2512 [*] Invalid unescaped character SUB, use "\x1A" instead +error[invalid-character-sub]: Invalid unescaped character SUB, use "\x1A" instead + --> invalid_characters.py:81:26 | 79 | esc = t'esc esc ␛' 80 | nested_tstrings = t'␈{t'�{t'␛'}'}' 81 | nested_ftstrings = t'␈{f'�{t'␛'}'}' - | ^ PLE2512 + | ^ | - = help: Replace with escape sequence - -ℹ Safe fix -78 78 | # tstrings -79 79 | esc = t'esc esc ␛' -80 80 | nested_tstrings = t'␈{t'�{t'␛'}'}' -81 |-nested_ftstrings = t'␈{f'�{t'␛'}'}' - 81 |+nested_ftstrings = t'␈{f'\x1A{t'␛'}'}' -82 82 | +help: Replace with escape sequence ``` </details>
1 parent 19569bf commit a54061e

File tree

7 files changed

+362
-92
lines changed

7 files changed

+362
-92
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crates/ruff_db/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ serde_json = { workspace = true, optional = true }
4242
thiserror = { workspace = true }
4343
tracing = { workspace = true }
4444
tracing-subscriber = { workspace = true, optional = true }
45+
unicode-width = { workspace = true }
4546
zip = { workspace = true }
4647

4748
[target.'cfg(target_arch="wasm32")'.dependencies]

crates/ruff_db/src/diagnostic/mod.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ use ruff_source_file::{LineColumn, SourceCode, SourceFile};
66
use ruff_annotate_snippets::Level as AnnotateLevel;
77
use ruff_text_size::{Ranged, TextRange, TextSize};
88

9-
pub use self::render::{DisplayDiagnostic, DisplayDiagnostics, FileResolver, Input};
9+
pub use self::render::{
10+
DisplayDiagnostic, DisplayDiagnostics, FileResolver, Input, ceil_char_boundary,
11+
};
1012
use crate::{Db, files::File};
1113

1214
mod render;

crates/ruff_db/src/diagnostic/render.rs

Lines changed: 237 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
use std::borrow::Cow;
12
use std::collections::BTreeMap;
23
use std::path::Path;
34

@@ -7,7 +8,7 @@ use ruff_annotate_snippets::{
78
};
89
use ruff_notebook::{Notebook, NotebookIndex};
910
use ruff_source_file::{LineIndex, OneIndexed, SourceCode};
10-
use ruff_text_size::{TextRange, TextSize};
11+
use ruff_text_size::{TextLen, TextRange, TextSize};
1112

1213
use crate::diagnostic::stylesheet::DiagnosticStylesheet;
1314
use crate::{
@@ -520,7 +521,7 @@ impl<'r> RenderableSnippets<'r> {
520521
#[derive(Debug)]
521522
struct RenderableSnippet<'r> {
522523
/// The actual snippet text.
523-
snippet: &'r str,
524+
snippet: Cow<'r, str>,
524525
/// The absolute line number corresponding to where this
525526
/// snippet begins.
526527
line_start: OneIndexed,
@@ -580,6 +581,13 @@ impl<'r> RenderableSnippet<'r> {
580581
.iter()
581582
.map(|ann| RenderableAnnotation::new(snippet_start, ann))
582583
.collect();
584+
585+
let EscapedSourceCode {
586+
text: snippet,
587+
annotations,
588+
} = replace_whitespace_and_unprintable(snippet, annotations)
589+
.fix_up_empty_spans_after_line_terminator();
590+
583591
RenderableSnippet {
584592
snippet,
585593
line_start,
@@ -590,7 +598,7 @@ impl<'r> RenderableSnippet<'r> {
590598

591599
/// Convert this to an "annotate" snippet.
592600
fn to_annotate<'a>(&'a self, path: &'a str) -> AnnotateSnippet<'a> {
593-
AnnotateSnippet::source(self.snippet)
601+
AnnotateSnippet::source(&self.snippet)
594602
.origin(path)
595603
.line_start(self.line_start.get())
596604
.annotations(
@@ -820,6 +828,230 @@ fn relativize_path<'p>(cwd: &SystemPath, path: &'p str) -> &'p str {
820828
path
821829
}
822830

831+
/// Given some source code and annotation ranges, this routine replaces tabs
832+
/// with ASCII whitespace, and unprintable characters with printable
833+
/// representations of them.
834+
///
835+
/// The source code and annotations returned are updated to reflect changes made
836+
/// to the source code (if any).
837+
fn replace_whitespace_and_unprintable<'r>(
838+
source: &'r str,
839+
mut annotations: Vec<RenderableAnnotation<'r>>,
840+
) -> EscapedSourceCode<'r> {
841+
// Updates the annotation ranges given by the caller whenever a single byte (at `index` in
842+
// `source`) is replaced with `len` bytes.
843+
//
844+
// When the index occurs before the start of the range, the range is
845+
// offset by `len`. When the range occurs after or at the start but before
846+
// the end, then the end of the range only is offset by `len`.
847+
let mut update_ranges = |index: usize, len: u32| {
848+
for ann in &mut annotations {
849+
if index < usize::from(ann.range.start()) {
850+
ann.range += TextSize::new(len - 1);
851+
} else if index < usize::from(ann.range.end()) {
852+
ann.range = ann.range.add_end(TextSize::new(len - 1));
853+
}
854+
}
855+
};
856+
857+
// If `c` is an unprintable character, then this returns a printable
858+
// representation of it (using a fancier Unicode codepoint).
859+
let unprintable_replacement = |c: char| -> Option<char> {
860+
match c {
861+
'\x07' => Some('␇'),
862+
'\x08' => Some('␈'),
863+
'\x1b' => Some('␛'),
864+
'\x7f' => Some('␡'),
865+
_ => None,
866+
}
867+
};
868+
869+
const TAB_SIZE: usize = 4;
870+
let mut width = 0;
871+
let mut column = 0;
872+
let mut last_end = 0;
873+
let mut result = String::new();
874+
for (index, c) in source.char_indices() {
875+
let old_width = width;
876+
match c {
877+
'\n' | '\r' => {
878+
width = 0;
879+
column = 0;
880+
}
881+
'\t' => {
882+
let tab_offset = TAB_SIZE - (column % TAB_SIZE);
883+
width += tab_offset;
884+
column += tab_offset;
885+
886+
let tab_width =
887+
u32::try_from(width - old_width).expect("small width because of tab size");
888+
result.push_str(&source[last_end..index]);
889+
890+
update_ranges(result.text_len().to_usize(), tab_width);
891+
892+
for _ in 0..tab_width {
893+
result.push(' ');
894+
}
895+
last_end = index + 1;
896+
}
897+
_ => {
898+
width += unicode_width::UnicodeWidthChar::width(c).unwrap_or(0);
899+
column += 1;
900+
901+
if let Some(printable) = unprintable_replacement(c) {
902+
result.push_str(&source[last_end..index]);
903+
904+
let len = printable.text_len().to_u32();
905+
update_ranges(result.text_len().to_usize(), len);
906+
907+
result.push(printable);
908+
last_end = index + 1;
909+
}
910+
}
911+
}
912+
}
913+
914+
// No tabs or unprintable chars
915+
if result.is_empty() {
916+
EscapedSourceCode {
917+
annotations,
918+
text: Cow::Borrowed(source),
919+
}
920+
} else {
921+
result.push_str(&source[last_end..]);
922+
EscapedSourceCode {
923+
annotations,
924+
text: Cow::Owned(result),
925+
}
926+
}
927+
}
928+
929+
struct EscapedSourceCode<'r> {
930+
text: Cow<'r, str>,
931+
annotations: Vec<RenderableAnnotation<'r>>,
932+
}
933+
934+
impl<'r> EscapedSourceCode<'r> {
935+
// This attempts to "fix up" the spans on each annotation in the case where
936+
// it's an empty span immediately following a line terminator.
937+
//
938+
// At present, `annotate-snippets` (both upstream and our vendored copy)
939+
// will render annotations of such spans to point to the space immediately
940+
// following the previous line. But ideally, this should point to the space
941+
// immediately preceding the next line.
942+
//
943+
// After attempting to fix `annotate-snippets` and giving up after a couple
944+
// hours, this routine takes a different tact: it adjusts the span to be
945+
// non-empty and it will cover the first codepoint of the following line.
946+
// This forces `annotate-snippets` to point to the right place.
947+
//
948+
// See also: <https://github.com/astral-sh/ruff/issues/15509> and
949+
// `ruff_linter::message::text::SourceCode::fix_up_empty_spans_after_line_terminator`,
950+
// from which this was adapted.
951+
fn fix_up_empty_spans_after_line_terminator(mut self) -> EscapedSourceCode<'r> {
952+
for ann in &mut self.annotations {
953+
let range = ann.range;
954+
if !range.is_empty()
955+
|| range.start() == TextSize::from(0)
956+
|| range.start() >= self.text.text_len()
957+
{
958+
continue;
959+
}
960+
if !matches!(
961+
self.text.as_bytes()[range.start().to_usize() - 1],
962+
b'\n' | b'\r'
963+
) {
964+
continue;
965+
}
966+
let start = range.start();
967+
let end = ceil_char_boundary(&self.text, start + TextSize::from(1));
968+
ann.range = TextRange::new(start, end);
969+
}
970+
971+
self
972+
}
973+
}
974+
975+
/// Finds the closest [`TextSize`] not less than the offset given for which
976+
/// `is_char_boundary` is `true`. Unless the offset given is greater than
977+
/// the length of the underlying contents, in which case, the length of the
978+
/// contents is returned.
979+
///
980+
/// Can be replaced with `str::ceil_char_boundary` once it's stable.
981+
///
982+
/// # Examples
983+
///
984+
/// From `std`:
985+
///
986+
/// ```
987+
/// use ruff_db::diagnostic::ceil_char_boundary;
988+
/// use ruff_text_size::{Ranged, TextLen, TextSize};
989+
///
990+
/// let source = "❤️🧡💛💚💙💜";
991+
/// assert_eq!(source.text_len(), TextSize::from(26));
992+
/// assert!(!source.is_char_boundary(13));
993+
///
994+
/// let closest = ceil_char_boundary(source, TextSize::from(13));
995+
/// assert_eq!(closest, TextSize::from(14));
996+
/// assert_eq!(&source[..closest.to_usize()], "❤️🧡💛");
997+
/// ```
998+
///
999+
/// Additional examples:
1000+
///
1001+
/// ```
1002+
/// use ruff_db::diagnostic::ceil_char_boundary;
1003+
/// use ruff_text_size::{Ranged, TextRange, TextSize};
1004+
///
1005+
/// let source = "Hello";
1006+
///
1007+
/// assert_eq!(
1008+
/// ceil_char_boundary(source, TextSize::from(0)),
1009+
/// TextSize::from(0)
1010+
/// );
1011+
///
1012+
/// assert_eq!(
1013+
/// ceil_char_boundary(source, TextSize::from(5)),
1014+
/// TextSize::from(5)
1015+
/// );
1016+
///
1017+
/// assert_eq!(
1018+
/// ceil_char_boundary(source, TextSize::from(6)),
1019+
/// TextSize::from(5)
1020+
/// );
1021+
///
1022+
/// let source = "α";
1023+
///
1024+
/// assert_eq!(
1025+
/// ceil_char_boundary(source, TextSize::from(0)),
1026+
/// TextSize::from(0)
1027+
/// );
1028+
///
1029+
/// assert_eq!(
1030+
/// ceil_char_boundary(source, TextSize::from(1)),
1031+
/// TextSize::from(2)
1032+
/// );
1033+
///
1034+
/// assert_eq!(
1035+
/// ceil_char_boundary(source, TextSize::from(2)),
1036+
/// TextSize::from(2)
1037+
/// );
1038+
///
1039+
/// assert_eq!(
1040+
/// ceil_char_boundary(source, TextSize::from(3)),
1041+
/// TextSize::from(2)
1042+
/// );
1043+
/// ```
1044+
pub fn ceil_char_boundary(text: &str, offset: TextSize) -> TextSize {
1045+
let upper_bound = offset
1046+
.to_u32()
1047+
.saturating_add(4)
1048+
.min(text.text_len().to_u32());
1049+
(offset.to_u32()..upper_bound)
1050+
.map(TextSize::from)
1051+
.find(|offset| text.is_char_boundary(offset.to_usize()))
1052+
.unwrap_or_else(|| TextSize::from(upper_bound))
1053+
}
1054+
8231055
#[cfg(test)]
8241056
mod tests {
8251057

@@ -2359,7 +2591,7 @@ watermelon
23592591
}
23602592

23612593
/// Returns a builder for tersely constructing diagnostics.
2362-
fn builder(
2594+
pub(super) fn builder(
23632595
&mut self,
23642596
identifier: &'static str,
23652597
severity: Severity,
@@ -2426,7 +2658,7 @@ watermelon
24262658
///
24272659
/// See the docs on `TestEnvironment::span` for the meaning of
24282660
/// `path`, `line_offset_start` and `line_offset_end`.
2429-
fn primary(
2661+
pub(super) fn primary(
24302662
mut self,
24312663
path: &str,
24322664
line_offset_start: &str,

0 commit comments

Comments
 (0)