Skip to content

Conversation

@jtfmumm
Copy link
Contributor

@jtfmumm jtfmumm commented Jul 1, 2025

As part of #14245 and #14346, uv was normalizing trailing slashes for find-links URLs by removing them. However, find-links URLs have different meanings depending on whether a trailing slash is present (as evidenced in 301 redirects from PyPI and #14367). This PR preserves trailing slashes if present in find-links URLs.

Fixes #14367

@jtfmumm jtfmumm assigned jtfmumm and unassigned jtfmumm Jul 1, 2025
@jtfmumm jtfmumm added the bug Something isn't working label Jul 1, 2025
@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch from f527ad5 to caa4b8d Compare July 1, 2025 09:27
@jtfmumm jtfmumm added the do-not-merge Pull request is not ready to merge label Jul 1, 2025
@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch from caa4b8d to 5233821 Compare July 1, 2025 09:27
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 1, 2025 09:34 — with GitHub Actions Inactive
@charliermarsh
Copy link
Member

I think relying on url.given() is dangerous, I don't think that persists through serialization / deserialization. I think we need to avoid stripping the trailing slash for --find-links URLs in the first place.

@jtfmumm jtfmumm changed the title Add original trailing slash if missing before find-links request Preserve original trailing slash if present on find-links URLs Jul 4, 2025
@jtfmumm jtfmumm had a problem deploying to uv-test-registries July 4, 2025 08:08 — with GitHub Actions Error
@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch from 41143c0 to 2cc4ce5 Compare July 4, 2025 08:08
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 4, 2025 08:10 — with GitHub Actions Inactive
@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch 2 times, most recently from cdbc20f to a5ac13d Compare July 4, 2025 08:50
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 4, 2025 08:52 — with GitHub Actions Inactive

/// Construct an [`IndexUrl`] from a [`VerbatimUrl`], applying a [`TrailingSlashPolicy`]
/// to non-file URLs.
fn from_verbatim_url_with_trailing_slash_policy(
Copy link
Contributor Author

@jtfmumm jtfmumm Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was moved from the From<VerbatimUrl> implementation. It's unchanged except that the trailing slash is only removed if TrailingSlashPolicy::Remove is passed in.

@jtfmumm
Copy link
Contributor Author

jtfmumm commented Jul 4, 2025

I think relying on url.given() is dangerous, I don't think that persists through serialization / deserialization. I think we need to avoid stripping the trailing slash for --find-links URLs in the first place.

Updated. I've also updated the VerbatimUrl rustdocs in #14456 to make this behavior clearer.

@jtfmumm jtfmumm changed the title Preserve original trailing slash if present on find-links URLs Preserve original trailing slash if present in find-links URLs Jul 4, 2025
@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch from a5ac13d to 6012b9d Compare July 4, 2025 09:06
@jtfmumm jtfmumm removed the do-not-merge Pull request is not ready to merge label Jul 4, 2025
@jtfmumm jtfmumm marked this pull request as ready for review July 4, 2025 09:06
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 4, 2025 09:07 — with GitHub Actions Inactive
/// directory.
///
/// Preserves trailing slash if present in `path`.
pub fn parse_preserving_trailing_slash(path: &str) -> Result<Self, IndexUrlError> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this name. Originally I had it as parse_without_slash_normalization. Could be parse_preserve_trailing_slash but I think parse_preserve_slash isn't enough information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided things like parse_as_given or parse_exact since there could have already been different kinds of normalization prior to this point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just use:

  • parse_simple_api
  • parse_find_links

And thus not require callers to pass in a trailing slash policy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's better. I can follow a similar pattern with the verbatim URL cases: e.g., simple_api_from_verbatim_url

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah or from_simple_api_url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


/// Construct an [`IndexUrl`] from a [`VerbatimUrl`], preserving a trailing
/// slash if present.
pub fn from_verbatim_url_preserving_trailing_slash(url: VerbatimUrl) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point as above about the function name

jtfmumm added a commit that referenced this pull request Jul 4, 2025
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 7, 2025 14:28 — with GitHub Actions Inactive
@jtfmumm jtfmumm temporarily deployed to uv-test-publish July 7, 2025 14:28 — with GitHub Actions Inactive
@charliermarsh
Copy link
Member

Actually, hmm, there may be one bug here. We call without_trailing_slash in satisfies on all index URLs. But I think that will now be wrong for --find-links URLs? So we'll incorrectly think that the lockfile needs to be updated, since there will be an index URL in the lockfile that doesn't match the "collection" of index URLs.

@charliermarsh
Copy link
Member

We should probably try to find a way to remove without_trailing_slash there.

@jtfmumm
Copy link
Contributor Author

jtfmumm commented Jul 8, 2025

Actually, hmm, there may be one bug here. We call without_trailing_slash in satisfies on all index URLs. But I think that will now be wrong for --find-links URLs? So we'll incorrectly think that the lockfile needs to be updated, since there will be an index URL in the lockfile that doesn't match the "collection" of index URLs.

I think we still need a way to normalize the registry sources we read in from the lockfile, but at that point we no longer know whether it was a Simple API or find-links URL. This leads me to think we should only be normalizing at the point of lockfile validation. I've opened an alternative PR that takes that approach instead.

@charliermarsh
Copy link
Member

I think normalizing the Simple API URLs is still a good thing, to be honest, since it minimizes churn. Should we just do both things? Normalize, but also make the lockfile validation more lax? I had this in Discord:

diff --git a/crates/uv-distribution-types/src/file.rs b/crates/uv-distribution-types/src/file.rs
index 948378c0c..c227b4266 100644
--- a/crates/uv-distribution-types/src/file.rs
+++ b/crates/uv-distribution-types/src/file.rs
@@ -169,15 +169,6 @@ impl UrlString {
             .map(|(path, _)| Cow::Owned(UrlString(SmallString::from(path))))
             .unwrap_or(Cow::Borrowed(self))
     }
-
-    /// Return the [`UrlString`] (as a [`Cow`]) with trailing slash removed.
-    #[must_use]
-    pub fn without_trailing_slash(&self) -> Cow<'_, Self> {
-        self.as_ref()
-            .strip_suffix('/')
-            .map(|path| Cow::Owned(UrlString(SmallString::from(path))))
-            .unwrap_or(Cow::Borrowed(self))
-    }
 }
 
 impl AsRef<str> for UrlString {
@@ -270,18 +261,4 @@ mod tests {
             Cow::Owned(UrlString("https://example.com/path?query".into()))
         );
     }
-
-    #[test]
-    fn without_trailing_slash() {
-        // Borrows a URL without a slash
-        let url = UrlString("https://example.com/path".into());
-        assert_eq!(url.without_trailing_slash(), Cow::Borrowed(&url));
-
-        // Removes the trailing slash if present on the URL
-        let url = UrlString("https://example.com/path/".into());
-        assert_eq!(
-            url.without_trailing_slash(),
-            Cow::Owned(UrlString("https://example.com/path".into()))
-        );
-    }
 }
diff --git a/crates/uv-resolver/src/lock/mod.rs b/crates/uv-resolver/src/lock/mod.rs
index beeadc912..9ae3f0e99 100644
--- a/crates/uv-resolver/src/lock/mod.rs
+++ b/crates/uv-resolver/src/lock/mod.rs
@@ -1437,7 +1437,7 @@ impl Lock {
                     }
                     IndexUrl::Path(_) => None,
                 })
-                .collect::<BTreeSet<_>>()
+                .collect::<Vec<_>>()
         });
 
         let locals = indexes.map(|locations| {
@@ -1479,10 +1479,9 @@ impl Lock {
                 match index {
                     RegistrySource::Url(url) => {
                         // Normalize URL before validating.
-                        let url = url.without_trailing_slash();
                         if remotes
                             .as_ref()
-                            .is_some_and(|remotes| !remotes.contains(&url))
+                            .is_some_and(|remotes| !remotes.iter().any(|remote| matches!(remote.as_ref().strip_prefix(url.as_ref()), '' | '/')))
                         {
                             let name = &package.id.name;
                             let version = &package

@zanieb
Copy link
Member

zanieb commented Jul 8, 2025

If this change would invalidate the lockfile, we should add a test case covering that.

@zanieb
Copy link
Member

zanieb commented Jul 8, 2025

at that point we no longer know whether it was a Simple API or find-links URL

This is separate, but... we should probably track this in the lockfile, right?

@jtfmumm
Copy link
Contributor Author

jtfmumm commented Jul 8, 2025

I think normalizing the Simple API URLs is still a good thing, to be honest, since it minimizes churn. Should we just do both things? Normalize, but also make the lockfile validation more lax?

I'm a little concerned that we'll find the trailing slash difference could end up being significant for some implementation of the Simple API down the line, but maybe it's not likely. I've updated this PR to follow the more lax approach, though we need to check if remote is a prefix of url (for the case that the lockfile contains a slash for a Simple API URL).

@jtfmumm
Copy link
Contributor Author

jtfmumm commented Jul 8, 2025

If this change would invalidate the lockfile, we should add a test case covering that.

Do you mean the more lax approach change? What scenario are you thinking of?

There are tests that check different combinations of trailing slashes in user-provided URLs vs. lockfile URLs to ensure validation still succeeds. Those still pass with this change.

@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch from 424c636 to 08025b0 Compare July 8, 2025 13:02
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 8, 2025 13:03 — with GitHub Actions Inactive
@zanieb
Copy link
Member

zanieb commented Jul 8, 2025

I'm referring to the scenario described in this comment: #14387 (comment)

Is that not correct?

@zanieb
Copy link
Member

zanieb commented Jul 8, 2025

Regarding the "lax" approach, if a trailing slash on the find-links URL has semantic meaning, we should invalidate the lockfile if someone adds a trailing slash or removes it, right?

@charliermarsh
Copy link
Member

I considered that too, but I don't see how it would succeed in the first place if the trailing slash is required (or vice versa).

@zanieb
Copy link
Member

zanieb commented Jul 8, 2025

It could just change the result of the resolution, right? i.e., the find-links index returns no packages vs some packages?

@charliermarsh
Copy link
Member

I think that's possible in theory, it just seems really unlikely in practice. I agree that the right solution here is probably to track --find-links indexes with a different prefix in the lockfile. I don't know if we want to do that here, or in v0.8, or what. But we should definitely get a fix out for this ASAP even if imperfect since it's breaking real user workflows.

@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 9, 2025 08:52 — with GitHub Actions Inactive
@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch from 060967b to 9ce0b63 Compare July 9, 2025 09:06
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 9, 2025 09:10 — with GitHub Actions Inactive
@jtfmumm jtfmumm force-pushed the jtfm/add-trailing-slash-for-find-links branch from 9ce0b63 to 08025b0 Compare July 9, 2025 09:30
@jtfmumm jtfmumm temporarily deployed to uv-test-registries July 9, 2025 09:33 — with GitHub Actions Inactive
@jtfmumm
Copy link
Contributor Author

jtfmumm commented Jul 9, 2025

Closing in favor of #14511

@jtfmumm jtfmumm closed this Jul 9, 2025
zanieb added a commit that referenced this pull request Jul 9, 2025
Reverts:

- #14349
- #14346
- #14245

Retains the test cases. Includes a `find-links` test case.

Supersedes

- #14387
- #14503

We originally got a report at
#13707 that inclusion of a
trailing slash on an index URL was causing lockfile churn despite having
no semantic meaning and resolved the issue by adding normalization that
stripped trailing slashes at parse time.

We then discovered that, while there are not semantic differences for
trailing slashes on Simple API index URLs, there are differences for
some flat (or find links) indexes. As reported in
#14367, the change in
#14245 caused a regression for at
least one user.

We attempted to fix the regression via a few approaches.

#14387 attempted to differentiate
between Simple API and flat index URL parsing, but failed to account for
the `Deserialize` implementation, which always assumed Simple API-style
index URLs and incorrectly trimmed trailing slashes in various cases
where we deserialized the `IndexUrl` type from a file. I attempted to
resolve this by performing a larger refactor, but it ended up being
quite painful. In particular, the `Index` type was a blocker — we don't
know the `IndexUrl` variant until we've parsed the `IndexFormat` and
having a multi-stage deserializer is not appealing but adding a new
intermediate type (i.e., `RawIndex`) is painful due to the pervasiveness
of `Index`. Given that we've regressed behavior here and there's not a
straight-forward fix, we're reverting the normalization entirely.

#14503 attempted to perform
normalization at compare-time, but that means we'd fail to invalidate
the lockfile when the a trailing slash was added or removed and given
that a trailing slash has semantic meaning for a find-links URL... we'd
have another correctness problem.

After this revert, we'll retain all index URLs verbatim. The downside to
this approach is that we'll be adding a bunch of trailing slashes back
to lockfiles that we previously normalized out, and we'll be reverting
our fix for users with inconsistent trailing slashes on their index
URLs. Users affected by the original motivating issue should use
consistent trailing slashes on their URLs, as they do frequently have
semantic meaning. We may want to revisit normalization and type aware
index URL parsing as part of a larger change.

Closes  #14367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--find-links URL Handling Regression in uv 0.7.17

4 participants