Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions text/0000-rustdoc-url-conflicts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
- Feature Name: rustdoc_url_conflicts
- Start Date: 2021-03-29
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

This RFC describes how the rustdoc URLs will be modified to fix the current issue on case insensitive file systems.
For example, in the libstd there is:
* keyword.self.html
* keyword.Self.html

If macOS or Windows users try to access to both URLs, they will always see the same content because for both file systems, it's the same file.

# Motivation
[motivation]: #motivation

This is one of the oldest rustdoc issue and has been problematic for years and a lot of users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more details about those places?

My first instinct here is that having two structs in the same module that differ only by case is something that people should never be doing, and thus this should never come up. TBH I'd be inclined to make a deny-by-default lint for it, and say that if you do it and the docs don't work I don't really care. But maybe there's an important situation I'm missing, so having that called out here could easily change my mind about it.

The places where things differ in only case that don't bother me are things like fn.default.html and trait.Default.html, where the item-kind disambiguator is already making the URL unique.

(Then we could special-case keyword.*.html to make it work. Or just make one doc for both of them.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems like a major change for very very little benefit in a case people should not be doing at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as soon as we enter into the "binding zone", it's just conflicts everywhere. What to do then? Rename items? That could work, but that doesn't sound like a good idea... This is one of the oldest rustdoc issue for a good reason after all. I think it's a really good change because it keeps a recognizable URL scheme while preventing name conflicts. But well, debates are what RFCs are for after all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, special case things is rarely a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuillaumeGomez what do you mean by the binding zone?

I do not think using dashes randomly is a recognizeable URL scheme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not random (If the explanation in the RFC isn't good, please tell meso I can update it!).

It replaces all capitalized letters by adding a dash before them and then putting the non-capital letter behind. So hELoPeople becomes h-e-lo-people.

For the "binding zone", I'll take a const example:

/usr/include/openssl/crypto.h
49:#  define SSLeay_version          OpenSSL_version
51:#  define SSLEAY_VERSION          OPENSSL_VERSION

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, by "randomly" I meant "not in a way that is intuitively predictable"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, special case things is rarely a good idea.

While I agree in general, I see no issue with special-casing keyword, since it's already a magic special case anyway. It's not like users can add keyword yolo; items in their crates that we need to document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They actually can. :p

The idea behind doc(keyword) is to allow proc-macro crates to add documentation for their own keywords if they introduce new ones.


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

The idea here is to replace every capitalized letter in the item with `-[minimized capital]`. So for example, `enum.FooBar.html` will become `enum.-foo-bar.html`.
Copy link
Member

@scottmcm scottmcm Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems really unfortunate that this would make even incredibly common things like struct.Vec.html have URLs that are materially worse (in my opinion).

Could it at least use a slightly more complicated scheme to take advantage of the usual rust naming conventions? Like have struct vec be the one that gets struct.-vec.html, since it's normal for structs to start with an upper-case letter, and let struct Vec continue to be struct.Vec.html?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't think it's good that common types get affected by this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't help with cases where the case differs mid-identifier and could be valid either way struct FooBar; struct FoObar;.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your point @Nemo157. It would be -fo-obar. It would prevent name conflict, which is the primary issue here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to @scottmcm's alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad. Then yes, I completely agree with you. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nemo157 Yes, that's not a complete fix, as it'd still change struct.FooBar.html to struct.Foo-Bar.html or similar.

But I don't think we should break non-agglutinative names just because we have to break agglutinative ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I guess my explanations are really bad... All capital letters are replaced by a dash followed by the lowercase version. So FooBar will become struct.-foo-bar.html. With this, no more conflicts.


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Multiple attempts were made but all of them had huge drawbacks. The last one suggested to handle "conflicted" paths by doing the following:
* Change name of conflicted files and store them somewhere else.
* Create one file which will then load the correct file based on the URL semantic.

There are multiple issues with this approach:
* It requires JS to work.
* It doesn't work on case-sensitive file systems (unless we make links to support all URLs but then it would be problematic on case-insensitive file systems).

Other approaches were discussed but their drawbacks were even bigger and more numerous. This approach seems to be the most viable.

# Drawbacks
[drawbacks]: #drawbacks

It'll make the URL harder to read.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this break almost every link, like https://doc.rust-lang.org/std/string/struct.String.html? Seems like that should be listed as a drawback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is my primary concern here. We'd be breaking a lot of URLs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it.


# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The other alternatives require either JS to be enabled all the time or just don't work on both case-sensitive and case-insensitive file systems.

# Prior art
[prior-art]: #prior-art

Like said previously, there were multiple attempts done before:
* https://github.com/rust-lang/rust/pull/83612
* https://github.com/rust-lang/rust/pull/64699
* https://github.com/rust-lang/rust/pull/59785

# Unresolved questions
[unresolved-questions]: #unresolved-questions

None.

# Future possibilities
[future-possibilities]: #future-possibilities

Maybe change the way we replace capitalized letters?