Skip to content

Conversation

@Jarcho
Copy link
Contributor

@Jarcho Jarcho commented May 2, 2025

This is a further rework of our snippet accessing/creating code.

The general design goals with this:

  • Avoid allocating when not strictly necessary.
  • Make it easy to avoid linting if an invalid span is used.
  • Make it easier to debug when an invalid span is used.
  • Make it easy to minimize the number of lookups in the source map.
  • Make it easy to avoid creating intermediary compressed spans.
  • Make it as easy as possible to compose span adjustments/checks while maintaining the previous goals.

changelog: None

@rustbot
Copy link
Collaborator

rustbot commented May 2, 2025

r? @Alexendoo

rustbot has assigned @Alexendoo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label May 2, 2025
@Jarcho
Copy link
Contributor Author

Jarcho commented May 2, 2025

The first two commits are mainly renames. The third commit has all the actual changes.

@Jarcho Jarcho force-pushed the source_rework branch 2 times, most recently from 91b8913 to b336277 Compare May 2, 2025 05:13
@Jarcho
Copy link
Contributor Author

Jarcho commented May 2, 2025

Looking at MISSING_DOCS_IN_PRIVATE_ITEMS, that lint needs to be restructured a bit. Macros currently mess up it's implementation quite a bit.

Why is this not part of rustc in the first place?

@rustbot

This comment has been minimized.

Comment on lines 272 to 469
/// Handle to a source file's text and a range within that file.
///
/// With debug assertions the range is checked to be a valid substring of the source text. Without
/// assertions `None` will be returned from various functions when accessing the substring of the
/// source text fails.
#[derive(Clone)]
pub struct SourceFileRange {
file: SourceText,
range: Range<RelativeBytePos>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both SourceFileRange and SourceText can represent subsets of a source file, could they be merged into a single type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing so would require that SourceFileRange always validate it's range. The main point of SourceText is that it's definitely a valid string.

They're also serving two different purpose. One is a substring of the source text, the other is a movable view of a whole file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always validating doesn't seem so bad to me, API wise it's some extra ?ing when using with_lo/with_hi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're still modelling two completely different things. Even if only one of them is exposed using both to implement it is still useful.

Comment on lines 287 to 301
&& let Some(span) = new_lhs.span.map_span(cx, |file| {
let src = file.with_hi(span.hi()).src_text()?;
// Do not continue if we have mismatched number of parens, otherwise the suggestion is wrong
src.matches('(').count() == src.matches(')').count()
(src.matches('(').count() == src.matches(')').count()).then_some(file)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_hi mutating file here is confusing, there's similar elsewhere with trim_start/trim_end

Reusing names of non mutating methods while also returning a value makes it difficult to realise what's happening

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those need to be renamed.

@rustbot

This comment has been minimized.

@rustbot rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) labels Jul 18, 2025
@github-actions
Copy link

github-actions bot commented Jul 18, 2025

No changes for 1eddd75

@Jarcho Jarcho added the S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work label Jul 18, 2025
github-merge-queue bot pushed a commit that referenced this pull request Jul 19, 2025
This is blocking #14724

changelog: none
github-merge-queue bot pushed a commit that referenced this pull request Jul 19, 2025
This is blocking #14724

changelog: none
@rustbot rustbot removed S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) has-merge-commits PR has merge commits, merge with caution. labels Jul 19, 2025
@rustbot

This comment has been minimized.

@rustbot

This comment has been minimized.

@Jarcho Jarcho force-pushed the source_rework branch 2 times, most recently from b21e0f0 to 3f4330b Compare November 16, 2025 23:36
@rustbot

This comment has been minimized.

@Jarcho Jarcho force-pushed the source_rework branch 2 times, most recently from 3f6aaae to bf9f4e9 Compare November 17, 2025 01:02
@Jarcho
Copy link
Contributor Author

Jarcho commented Nov 17, 2025

The API has changed again; this time to support splitting a range. Changes include:

  • get_source_text has been renamed to get_text. The "source" part of the name doesn't really add anything and just makes it longer. check_source_text was renamed similarly.
  • SourceFileRange has been replaced with SpanEditCx which no longer contains the range.
  • map_range's callback now takes both the edit context and the range as separate parameters and returns either a single range or an array of ranges.
  • Functions for shrinking a range based on it's contents have been replaced by a general &str -> &str transformation that recalculates the range positions. This can also return an array of strings.
  • Some entry points no longer load external sources. get_text and check_text are still loading external sources since they hit debug assertions otherwise.
  • A couple of general utils have been added (display and StrExt). They aren't specific to touching the source code, but they will mainly be used when working with it.

I might switch the range editing functions to be on the range, but that means an extra trait will need to be imported everywhere. range.with_leading_prefix(scx, "prefix") is nicer when chaining multiple, so it's probably worth it.

github-merge-queue bot pushed a commit that referenced this pull request Nov 17, 2025
This is blocking #14724 as it triggers the debug
assertions it adds.

The inter item span parsing was introduced to solve
#12197. This is better handled by just not linting
anything within bodies.

changelog: [`missing_docs_in_private_items`]: Don't lint items in bodies
and automatically derived impls
changelog: [`missing_docs_in_private_items`]: Better detect when things
are accessible from the crate root
changelog: [`missing_docs_in_private_items`]: Lint unnameable items
which are accessible outside the crate
@rustbot

This comment has been minimized.

@Jarcho Jarcho removed the S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work label Nov 18, 2025
@Jarcho Jarcho force-pushed the source_rework branch 6 times, most recently from 51934e2 to 0cff672 Compare November 19, 2025 19:36
@rustbot

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Dec 11, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rustbot
Copy link
Collaborator

rustbot commented Dec 11, 2025

☔ The latest upstream changes (possibly 9e3e964) made this pull request unmergeable. Please resolve the merge conflicts.

Comment on lines +47 to +58
/// Creates a type which implements `Display` by calling the specified function.
#[inline]
#[must_use]
pub fn display(f: impl Fn(&mut fmt::Formatter<'_>) -> fmt::Result) -> impl fmt::Display {
struct S<T>(T);
impl<T: Fn(&mut fmt::Formatter<'_>) -> fmt::Result> fmt::Display for S<T> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0(f)
}
}
S(f)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be std::fmt::from_fn

Comment on lines 233 to 250
/// A type representing a range in the `SourceMap`.
pub trait SourceRange: Sized {
#[must_use]
fn into_range(self) -> Range<BytePos>;
}
impl SourceRange for Range<BytePos> {
#[inline]
fn into_range(self) -> Range<BytePos> {
self
}
}
impl<T: SpanLike> SourceRange for T {
#[inline]
fn into_range(self) -> Range<BytePos> {
let data = self.data();
data.lo..data.hi
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SourceRange has one usage (in get_external_text), and it doesn't appear to be called on Range<BytePos>. We could simplify things by removing this trait and making SpanExt: SpanLike

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpanExt can't require SpanLike since it needs to be implemented for Range<BytePos>. into_range could just be part of SpanExt though. That separation is leftover from other designs.

Comment on lines 379 to 383
fn map_range<'sm, T: IntoSpans>(
self,
sm: impl HasSourceMap<'sm>,
f: impl for<'a> FnOnce(&'a SourceFile, &'a str, Range<usize>) -> Option<Range<usize>>,
) -> Option<Range<BytePos>> {
map_range(sm.source_map(), self.into_range(), f)
f: impl for<'a> FnOnce(&'a SpanEditCx<'sm>, FileRange) -> Option<T>,
) -> Option<T::Output>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the T here is a bit too much abstraction, we could keep this concrete in Span/FileRange to make it easier to understand, then add a second method for the multiple case

The new method could also be changed to be concrete in (FileRange, FileRange), all uses are currently [FileRange; 2] as far as I can see, the tuple would allow e.g. split_once to work as well. More advanced cases (3+) could always be covered by mk_edit_cx

If we did the same for FileRangeExt::map_range_text we could remove both IntoSpans and IntoSubRanges

/// |`0` |`m1!`, `m2!`|None |
#[inline]
#[must_use]
fn walk_to_ctxt(self, ctxt: SyntaxContext) -> Option<Span>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could do with some documentation on how it differs from hygiene::walk_chain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the most part it doesn't differ. It only returns None when the target context isn't in the current expansion chain and works around a dumb annoyance with range sugarings.

if expn.call_site.ctxt() != ctxt {
let sp = hygiene::walk_chain(expn.call_site, ctxt);
(sp.ctxt() == ctxt).then_some(sp)
} else if matches!(expn.kind, ExpnKind::Desugaring(DesugaringKind::RangeExpr)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's unique about range desugaring that it's handled here? Should it be changed in rustc instead?

If I'm reading it right it also only applies when the outermost expn is said desugaring, what about a range in a macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to keep the parenthesis on a range expression. The call site of (x..y) is only x..y.

You are reading this wrong. The special case only works happens if the the target context is the immediate parent of the range desugaring. Note the outer_expn_data is a badly named function. The "outer" part is outermost from the root.

Comment on lines +468 to +473
/// Walks this span up the macro call chain to the root context.
///
/// See `walk_to_ctxt` for details.
#[inline]
#[must_use]
fn walk_to_root(self) -> Span
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to walk_to_ctxt, could do with answering how it differs from Span::source_callsite

/// validation.
#[inline]
#[cfg_attr(debug_assertions, track_caller)]
pub fn dbg_check_range(&self, old: Option<FileRange>, new: FileRange) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be private

Comment on lines +1335 to +1359
pub trait StrExt {
/// Gets the substring which ranges from the start of the first match of the pattern to the end
/// of the second match. Returns `None` if the pattern doesn't occur twice.
fn find_bounded_inclusive(&self, pat: impl Pattern) -> Option<&Self>;

/// Gets the non-overlapping prefix and suffix. Returns `None` if the string doesn't start with
/// the prefix or end with the suffix.
///
/// The prefix will be taken first, with the suffix taken from the remainder of the string.
fn get_prefix_suffix<P>(&self, prefix: impl Pattern, suffix: P) -> Option<[&Self; 2]>
where
P: Pattern,
for<'a> P::Searcher<'a>: ReverseSearcher<'a>;

/// Splits a string into a prefix and everything proceeding it. Returns `None` if the string
/// doesn't start with the prefix.
fn split_prefix(&self, pat: impl Pattern) -> Option<[&Self; 2]>;

/// Splits a string into a suffix and everything preceding it. Returns `None` if the string
/// doesn't end with the suffix.
fn split_suffix<P>(&self, pat: P) -> Option<[&Self; 2]>
where
P: Pattern,
for<'a> P::Searcher<'a>: ReverseSearcher<'a>;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some examples would be great

Comment on lines +1342 to +1343
///
/// The prefix will be taken first, with the suffix taken from the remainder of the string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the order change the result here?

Copy link
Contributor Author

@Jarcho Jarcho Dec 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Pattern impl could cause this to matter, but none of the ones from std will.

Comment on lines +107 to +111
if let Some((scx, range)) = inner.span.mk_edit_cx(cx)
&& let Some(range) = range.with_trailing_whitespace(&scx)
&& let Some(range) = range.with_leading_whitespace(&scx)
&& let Some(range) = range.with_trailing_match(&scx, ')')
&& range.with_leading_match(&scx, '(').is_some()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this isn't a map_range? Either way it would be good to add some advice to mk_edit_cx about when you might want to use it compared to the other utilities

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of map_range wouldn't be used. This avoids the FileRange to Span transition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants