-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Preserve original trailing slash if present in find-links URLs
#14387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f527ad5 to
caa4b8d
Compare
caa4b8d to
5233821
Compare
|
I think relying on |
find-links requestfind-links URLs
41143c0 to
2cc4ce5
Compare
cdbc20f to
a5ac13d
Compare
|
|
||
| /// Construct an [`IndexUrl`] from a [`VerbatimUrl`], applying a [`TrailingSlashPolicy`] | ||
| /// to non-file URLs. | ||
| fn from_verbatim_url_with_trailing_slash_policy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic was moved from the From<VerbatimUrl> implementation. It's unchanged except that the trailing slash is only removed if TrailingSlashPolicy::Remove is passed in.
Updated. I've also updated the |
find-links URLsfind-links URLs
a5ac13d to
6012b9d
Compare
| /// directory. | ||
| /// | ||
| /// Preserves trailing slash if present in `path`. | ||
| pub fn parse_preserving_trailing_slash(path: &str) -> Result<Self, IndexUrlError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this name. Originally I had it as parse_without_slash_normalization. Could be parse_preserve_trailing_slash but I think parse_preserve_slash isn't enough information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided things like parse_as_given or parse_exact since there could have already been different kinds of normalization prior to this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we just use:
parse_simple_apiparse_find_links
And thus not require callers to pass in a trailing slash policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's better. I can follow a similar pattern with the verbatim URL cases: e.g., simple_api_from_verbatim_url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah or from_simple_api_url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
|
||
| /// Construct an [`IndexUrl`] from a [`VerbatimUrl`], preserving a trailing | ||
| /// slash if present. | ||
| pub fn from_verbatim_url_preserving_trailing_slash(url: VerbatimUrl) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point as above about the function name
…erialization (#14456) This came up in [discussion](#14387 (comment)) on #14387.
|
Actually, hmm, there may be one bug here. We call |
|
We should probably try to find a way to remove |
I think we still need a way to normalize the registry sources we read in from the lockfile, but at that point we no longer know whether it was a Simple API or find-links URL. This leads me to think we should only be normalizing at the point of lockfile validation. I've opened an alternative PR that takes that approach instead. |
|
I think normalizing the Simple API URLs is still a good thing, to be honest, since it minimizes churn. Should we just do both things? Normalize, but also make the lockfile validation more lax? I had this in Discord: diff --git a/crates/uv-distribution-types/src/file.rs b/crates/uv-distribution-types/src/file.rs
index 948378c0c..c227b4266 100644
--- a/crates/uv-distribution-types/src/file.rs
+++ b/crates/uv-distribution-types/src/file.rs
@@ -169,15 +169,6 @@ impl UrlString {
.map(|(path, _)| Cow::Owned(UrlString(SmallString::from(path))))
.unwrap_or(Cow::Borrowed(self))
}
-
- /// Return the [`UrlString`] (as a [`Cow`]) with trailing slash removed.
- #[must_use]
- pub fn without_trailing_slash(&self) -> Cow<'_, Self> {
- self.as_ref()
- .strip_suffix('/')
- .map(|path| Cow::Owned(UrlString(SmallString::from(path))))
- .unwrap_or(Cow::Borrowed(self))
- }
}
impl AsRef<str> for UrlString {
@@ -270,18 +261,4 @@ mod tests {
Cow::Owned(UrlString("https://example.com/path?query".into()))
);
}
-
- #[test]
- fn without_trailing_slash() {
- // Borrows a URL without a slash
- let url = UrlString("https://example.com/path".into());
- assert_eq!(url.without_trailing_slash(), Cow::Borrowed(&url));
-
- // Removes the trailing slash if present on the URL
- let url = UrlString("https://example.com/path/".into());
- assert_eq!(
- url.without_trailing_slash(),
- Cow::Owned(UrlString("https://example.com/path".into()))
- );
- }
}
diff --git a/crates/uv-resolver/src/lock/mod.rs b/crates/uv-resolver/src/lock/mod.rs
index beeadc912..9ae3f0e99 100644
--- a/crates/uv-resolver/src/lock/mod.rs
+++ b/crates/uv-resolver/src/lock/mod.rs
@@ -1437,7 +1437,7 @@ impl Lock {
}
IndexUrl::Path(_) => None,
})
- .collect::<BTreeSet<_>>()
+ .collect::<Vec<_>>()
});
let locals = indexes.map(|locations| {
@@ -1479,10 +1479,9 @@ impl Lock {
match index {
RegistrySource::Url(url) => {
// Normalize URL before validating.
- let url = url.without_trailing_slash();
if remotes
.as_ref()
- .is_some_and(|remotes| !remotes.contains(&url))
+ .is_some_and(|remotes| !remotes.iter().any(|remote| matches!(remote.as_ref().strip_prefix(url.as_ref()), '' | '/')))
{
let name = &package.id.name;
let version = &package |
|
If this change would invalidate the lockfile, we should add a test case covering that. |
This is separate, but... we should probably track this in the lockfile, right? |
I'm a little concerned that we'll find the trailing slash difference could end up being significant for some implementation of the Simple API down the line, but maybe it's not likely. I've updated this PR to follow the more lax approach, though we need to check if |
Do you mean the more lax approach change? What scenario are you thinking of? There are tests that check different combinations of trailing slashes in user-provided URLs vs. lockfile URLs to ensure validation still succeeds. Those still pass with this change. |
424c636 to
08025b0
Compare
|
I'm referring to the scenario described in this comment: #14387 (comment) Is that not correct? |
|
Regarding the "lax" approach, if a trailing slash on the find-links URL has semantic meaning, we should invalidate the lockfile if someone adds a trailing slash or removes it, right? |
|
I considered that too, but I don't see how it would succeed in the first place if the trailing slash is required (or vice versa). |
|
It could just change the result of the resolution, right? i.e., the find-links index returns no packages vs some packages? |
|
I think that's possible in theory, it just seems really unlikely in practice. I agree that the right solution here is probably to track |
060967b to
9ce0b63
Compare
9ce0b63 to
08025b0
Compare
|
Closing in favor of #14511 |
Reverts: - #14349 - #14346 - #14245 Retains the test cases. Includes a `find-links` test case. Supersedes - #14387 - #14503 We originally got a report at #13707 that inclusion of a trailing slash on an index URL was causing lockfile churn despite having no semantic meaning and resolved the issue by adding normalization that stripped trailing slashes at parse time. We then discovered that, while there are not semantic differences for trailing slashes on Simple API index URLs, there are differences for some flat (or find links) indexes. As reported in #14367, the change in #14245 caused a regression for at least one user. We attempted to fix the regression via a few approaches. #14387 attempted to differentiate between Simple API and flat index URL parsing, but failed to account for the `Deserialize` implementation, which always assumed Simple API-style index URLs and incorrectly trimmed trailing slashes in various cases where we deserialized the `IndexUrl` type from a file. I attempted to resolve this by performing a larger refactor, but it ended up being quite painful. In particular, the `Index` type was a blocker — we don't know the `IndexUrl` variant until we've parsed the `IndexFormat` and having a multi-stage deserializer is not appealing but adding a new intermediate type (i.e., `RawIndex`) is painful due to the pervasiveness of `Index`. Given that we've regressed behavior here and there's not a straight-forward fix, we're reverting the normalization entirely. #14503 attempted to perform normalization at compare-time, but that means we'd fail to invalidate the lockfile when the a trailing slash was added or removed and given that a trailing slash has semantic meaning for a find-links URL... we'd have another correctness problem. After this revert, we'll retain all index URLs verbatim. The downside to this approach is that we'll be adding a bunch of trailing slashes back to lockfiles that we previously normalized out, and we'll be reverting our fix for users with inconsistent trailing slashes on their index URLs. Users affected by the original motivating issue should use consistent trailing slashes on their URLs, as they do frequently have semantic meaning. We may want to revisit normalization and type aware index URL parsing as part of a larger change. Closes #14367
As part of #14245 and #14346, uv was normalizing trailing slashes for find-links URLs by removing them. However, find-links URLs have different meanings depending on whether a trailing slash is present (as evidenced in 301 redirects from PyPI and #14367). This PR preserves trailing slashes if present in find-links URLs.
Fixes #14367