Skip to content
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
4a3a2a1
add index_files option
katrinafyi Jul 24, 2025
a1fa4b4
fix compilation
katrinafyi Jul 26, 2025
e8d07c3
phrasing
katrinafyi Jul 26, 2025
5296d23
fjdisaofjdisoajfidosa
katrinafyi Jul 26, 2025
73a6a4d
early switch between fallback extensions and index files
katrinafyi Jul 26, 2025
23b8092
docs and it compiles now
katrinafyi Jul 26, 2025
c3fab82
COW
katrinafyi Jul 26, 2025
51afc96
update readme text
katrinafyi Jul 27, 2025
2fc3f29
Merge remote-tracking branch 'origin/master' into katrinafyi-patch-1
katrinafyi Jul 27, 2025
a59ea60
increment errors due to not searching index.html by default
katrinafyi Jul 27, 2025
5173748
tests and fixtures
katrinafyi Jul 27, 2025
118a6f0
fmt and fix test
katrinafyi Jul 27, 2025
c2504dc
lint
katrinafyi Jul 27, 2025
67322d7
add more corner cases to tests
katrinafyi Jul 28, 2025
b55f493
separate test cases for corner cases
katrinafyi Jul 28, 2025
b217503
fix index_files default in ClientBuilder (oops!)
katrinafyi Jul 30, 2025
bb55866
Merge remote-tracking branch 'origin/master' into katrinafyi-patch-1
katrinafyi Jul 30, 2025
4362cdf
tests: now expected to fail `#a-link-inside-index-html-inside-sub-dir`
katrinafyi Jul 31, 2025
8d25acd
revert README.md change where we hardcoded the version
katrinafyi Jul 31, 2025
f6b678e
review: make `index_files` an Option for clearer default behavior
katrinafyi Jul 31, 2025
1d76955
docs: add `--index-files ''` and `--index-files index.html,.`
katrinafyi Aug 1, 2025
ed499f7
tests: add cli tests for `--index-files`
katrinafyi Aug 1, 2025
a4f7497
review: confirm path traversal is allowed in `--index-files`
katrinafyi Aug 2, 2025
0fd19b7
review: typo
katrinafyi Aug 7, 2025
3490256
fix tests bug, where an assertion was weaker than desired
katrinafyi Aug 7, 2025
82b537b
Merge branch 'master' into katrinafyi-patch-1
katrinafyi Aug 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -452,12 +452,34 @@ Options:
Remap URI matching pattern to different URI

--fallback-extensions <FALLBACK_EXTENSIONS>
Test the specified file extensions for URIs when checking files locally.
Multiple extensions can be separated by commas. Extensions will be checked in
order of appearance.
When checking locally, attempts to locate missing files by trying the given
fallback extensions. Multiple extensions can be separated by commas. Extensions
will be checked in order of appearance.

Example: --fallback-extensions html,htm,php,asp,aspx,jsp,cgi

Note: This option only takes effect on `file://` URIs which do not exist.

--index-files <INDEX_FILES>
When checking locally, resolves directory links to an index file within the
directory. The argument is a comma-separated list of index file names to search
for. Index files are attempted in the order given, and at least one must exist
in order for a directory link to be considered valid.

The special name `.` can be used to resolve directory links to the directory
itself. When `.` is specified, a directory link will be accepted as long as the
directory exists. This is the default behavior.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty clever

Copy link
Copy Markdown
Member Author

@katrinafyi katrinafyi Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's too clever. For instance, I have to apply this default value at lots of different levels - CLI parsing and ClientBuilder both set it.

Is this okay? Or maybe the API should take Options?

Edit: In fact, I just found a bug to do with setting the default value 😖

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting idea. But I think it's overly complicated. Also if I didn't misunderstand there does not seem to be any use-case for this ever. If a user ever provides . as a value (e.g. --index-files index.html,index.md,.) it should be exactly equivalent to not providing the whole flag at all.

So the only reason for .'s existence is for internal representation which is much cleaner represented with an empty list and probably a new condition withing apply_index_files. (or Option if really necessary)

Copy link
Copy Markdown
Member Author

@katrinafyi katrinafyi Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--index-files index.html,index.md,. is different to omitting the flag because it will use an index file if it exists before falling back to the directory. this could be useful if somebody has a setup like this.

file names after a ., however, will have no effect

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field has been changed to an option :) See f6b678e for more commentary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification and for the new commit. I'm starting to think that we should now either:

Rename the flag from --index-files to something like --remap-directory. (maybe you have a another idea) This captures the current behaviour better IMO, as this is not necessarily about index files. (though this might be the most common use-case) Especially since we can provide arbitrary values such as ., .. and some/subdirectory/random.file. Or only allow values of simple file names that don't affect the directory in which the remapped files lie. But we pretty much already agreed that the latter case is not a priority. So my preference is renaming the flag. The flag description and documentation of course can still explain the common "index" use-case.

Copy link
Copy Markdown
Member Author

@katrinafyi katrinafyi Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I can't imagine a use for this flag which doesn't include index files in some way. . is the only directory name which will be accepted, and this is to support the case for optional index files. If you want, I can add this use case to the help and test cases to make it more visible. If you pass .. or some other path to a directory, those entries will never be considered a valid index file. This is recorded by one of the corner test cases.

And we've established that path traversal into nested directories is accidental rather than intentional, so I wouldn't want to generalise the option name to include this unsupported use. In the other comment thread, I talked about eventually restricting this with validation.

So, my recommendation is to keep the current name. It's clear to users who will be looking for this feature, and I don't think anyone will be surprised by its name or functionality.

But again, if you want to overrule me, let me know your desired name and I will make it so.

Copy link
Copy Markdown
Member Author

@katrinafyi katrinafyi Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other problem is remap is already a defined term in the Lycheeverse. remap-directories has the connotation that this is a directory-specific version of remap, but its functionality is not nearly that powerful. It can't rewrite directories in the middle of paths, for instance. Nor can it change its behaviour based on the URL being checked.

I'd rather have a name which is very slightly too narrow than one which is too broad. (And I don't want to think of another name rn :))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay let's keep the flag name then. Ah true, in the current state a parameter like .. doesn't make sense because directories are not accepted, except for .. But I actually think it's not necessary then to handle . differently than all other directories


An empty string can be passed to reject all local directory links. This will
require all links to explicitly name an HTML file, rather than linking to a
directory.

Example: --index-files index.html,readme.md

Note: This option only takes effect on `file://` URIs which exist and point to a directory.

[default: .]

-H, --header <HEADER:VALUE>
Set custom header for requests

Expand Down
Empty file.
Empty file.
1 change: 1 addition & 0 deletions fixtures/filechecker/index_dir/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p id="fragment">boop</p>
Empty file.
Empty file.
Empty file.
4 changes: 2 additions & 2 deletions fixtures/fragments-fallback-extensions/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
</head>

<body>
<a href="sub_dir#valid-id">1</a>
<a href="sub_dir#invalid-id">2</a>
<a href="sub_dir/index#valid-id">1</a>
<a href="sub_dir/index#invalid-id">2</a>
<a href="empty_dir#invalid-id">3</a>
</body>

Expand Down
1 change: 1 addition & 0 deletions lychee-bin/src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ pub(crate) fn create(cfg: &Config, cookie_jar: Option<&Arc<CookieStoreMutex>>) -
.min_tls_version(cfg.min_tls.clone().map(Into::into))
.include_fragments(cfg.include_fragments)
.fallback_extensions(cfg.fallback_extensions.clone())
.index_files(cfg.index_files.clone())
.build()
.client()
.context("Failed to create request client")
Expand Down
37 changes: 32 additions & 5 deletions lychee-bin/src/options.rs
Original file line number Diff line number Diff line change
Expand Up @@ -544,19 +544,46 @@ and 501."
#[arg(long)]
pub(crate) remap: Vec<String>,

/// Automatically append file extensions to `file://` URIs as needed
/// Automatically append file extensions to `file://` URIs for non-existing paths
#[serde(default)]
#[arg(
long,
value_delimiter = ',',
long_help = "Test the specified file extensions for URIs when checking files locally.
Multiple extensions can be separated by commas. Extensions will be checked in
order of appearance.
long_help = "When checking locally, attempts to locate missing files by trying the given
fallback extensions. Multiple extensions can be separated by commas. Extensions
will be checked in order of appearance.

Example: --fallback-extensions html,htm,php,asp,aspx,jsp,cgi

Example: --fallback-extensions html,htm,php,asp,aspx,jsp,cgi"
Note: This option only takes effect on `file://` URIs which do not exist."
)]
pub(crate) fallback_extensions: Vec<String>,

/// Resolve local directory links to specified index files within the directory
#[serde(default)]
#[arg(
long,
default_value = ".",
value_delimiter = ',',
long_help = "When checking locally, resolves directory links to an index file within the
directory. The argument is a comma-separated list of index file names to search
for. Index files are attempted in the order given, and at least one must exist
in order for a directory link to be considered valid.

The special name `.` can be used to resolve directory links to the directory
itself. When `.` is specified, a directory link will be accepted as long as the
directory exists. This is the default behavior.

An empty string can be passed to reject all local directory links. This will
require all links to explicitly name an HTML file, rather than linking to a
directory.

Example: --index-files index.html,readme.md

Note: This option only takes effect on `file://` URIs which exist and point to a directory."
)]
pub(crate) index_files: Vec<String>,

/// Set custom header for requests
#[arg(
short = 'H',
Expand Down
2 changes: 1 addition & 1 deletion lychee-bin/tests/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1908,7 +1908,6 @@ mod cli {
"fixtures/fragments/file.html#top",
"fixtures/fragments/file.html#Upper-%C3%84%C3%96%C3%B6",
"fixtures/fragments/sub_dir",
"fixtures/fragments/sub_dir#a-link-inside-index-html-inside-sub-dir",
"fixtures/fragments/zero.bin",
"fixtures/fragments/zero.bin#",
"fixtures/fragments/zero.bin#fragment",
Expand All @@ -1922,6 +1921,7 @@ mod cli {
let expected_failures = vec![
"fixtures/fragments/sub_dir_non_existing_1",
"fixtures/fragments/sub_dir#non-existing-fragment-2",
"fixtures/fragments/sub_dir#a-link-inside-index-html-inside-sub-dir",
"fixtures/fragments/empty_dir#non-existing-fragment-3",
"fixtures/fragments/file2.md#missing-fragment",
"fixtures/fragments/sub_dir#non-existing-fragment-1",
Expand Down
Loading
Loading