Prevent duplicate requests to the same URLs by thomas-zahner · Pull Request #2067 · lycheeverse/lychee

thomas-zahner · 2026-03-02T10:32:31Z

This is done by locking a Mutex for each Uri.
Previously duplicate Uris were sometimes checked, depending on the timing on when the other duplicates were cached.

This is done by locking a Mutex for each Uri. Previously duplicate Uris were sometimes checked, depending on the timing on when the other duplicates were cached.

This prevents backoff and rate limiting cache hits

katrinafyi

Looks like it should work. I noticed this could happen while looking at this code for #2035 but I thought that the benefit was small.

This is because because there is already the top-level cache in check.rs which would prevent most duplicated requests where the second request starts after the first has finished. This duplication only happens when the same URI is picked up simultaneously by 2 tasks and they're both in-progress at the same time.

But in that case, adding a mutex just blocks the second task until the first completes and won't increase overall throughput - the blocked task still counts towards max_concurrency. It does slightly reduce the number of requests, but host concurrency already applies and already keeps it reasonable.

To increase throughput, I think we'd need something higher up at the check.rs level. It should keep track of URLs in-progress and, if duplicates are seen, it should divert them to a side-channel and wait using something like https://docs.rs/tokio/latest/tokio/sync/struct.SetOnce.html#method.wait

But anyway.... the PR is fine :)

katrinafyi · 2026-03-02T12:16:35Z

lychee-lib/src/ratelimit/host/host.rs

        let _permit = self.acquire_semaphore().await;
+        let uri_mutex = self.acquire_uri_mutex(&uri);
+        let _uri_guard = uri_mutex.lock().await;

        if let Some(cached) = self.get_cached_status(&uri, needs_body) {


The first cache check could be done even earlier, before the semaphore acquire since it doesn't need to be limited by host concurrency.

Also, acquire_uri_mutex is named similar to acquire_semaphore but it does different things - the semaphore also gains the lock, but the mutex one doesn't. Could acquire_uri_mutex be changed to just return the lock guard?

The first cache check could be done even earlier

Ah true, see 78634dd

Could acquire_uri_mutex be changed to just return the lock guard?

Yeah, it would be nice to call the function in a single line, but I didn't get that to work unfortunately.

Oh right, the name might be weird. What about 0c6bece?

Cache check looks good now.

Is there a reason to prefer a function that returns the mutex rather than a function that returns the lock guard? (To make the lifetimes work, I think you'd need to use lock_owned)

Oooh, that's awesome! So yeah, the reason for it was that I couldn't get it to work because it tried using lock instead of lock_owned. See 86c7f2a

thomas-zahner · 2026-03-02T13:50:40Z

Looks like it should work. I noticed this could happen while looking at this code for #2035 but I thought that the benefit was small.

Ah cool. Yeah, you are totally right. But at the same time I think it's not that unlikely to happen. The chance that URL's are duplicated, potentially across multiple files, is quite big. Especially if you consider that the Host struct applies not for all URLs but only for the specific host/subdomain.

To increase throughput, I think we'd need something higher up at the check.rs level.

Yeah, thanks for the idea. True, it does not increase throughput at all. But it should save resources might make it a bit faster, especially if encountering rate limiting.

@katrinafyi

Thanks to @katrinafyi for the trick with lock_owned()

katrinafyi

Thanks for the changes!

thomas-zahner · 2026-03-03T10:02:15Z

Thank you for reviewing and helping me with the lock_owned trick!

Prevent duplicate requests to the same URLs

6511564

This is done by locking a Mutex for each Uri. Previously duplicate Uris were sometimes checked, depending on the timing on when the other duplicates were cached.

thomas-zahner requested review from katrinafyi and mre March 2, 2026 10:32

thomas-zahner mentioned this pull request Mar 2, 2026

Add JUnit #2066

Merged

Check cache earlier

d1a91ea

This prevents backoff and rate limiting cache hits

katrinafyi approved these changes Mar 2, 2026

View reviewed changes

thomas-zahner added 2 commits March 2, 2026 14:56

Check cache even earlier

78634dd

Update lock_uri_mutex

86c7f2a

Thanks to @katrinafyi for the trick with lock_owned()

thomas-zahner force-pushed the prevent-duplicate-requests branch from 0c6bece to 86c7f2a Compare March 3, 2026 09:51

katrinafyi approved these changes Mar 3, 2026

View reviewed changes

thomas-zahner merged commit a3591de into lycheeverse:master Mar 3, 2026
7 checks passed

mre mentioned this pull request Feb 25, 2026

chore: release v0.24.0 #2048

Open

katrinafyi mentioned this pull request Mar 4, 2026

Duplicate network requests due to cache race condition (follow-up to #1815) #2073

Closed

katrinafyi mentioned this pull request Mar 16, 2026

Fix double count #2088

Open

This was referenced Mar 26, 2026

Caching used without --cache #2103

Open

Cache for deduplicating top-level in-progress requests #2107

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent duplicate requests to the same URLs#2067

Prevent duplicate requests to the same URLs#2067
thomas-zahner merged 4 commits intolycheeverse:masterfrom
thomas-zahner:prevent-duplicate-requests

thomas-zahner commented Mar 2, 2026

Uh oh!

katrinafyi left a comment

Uh oh!

katrinafyi Mar 2, 2026

Uh oh!

thomas-zahner Mar 2, 2026 •

edited

Loading

Uh oh!

katrinafyi Mar 2, 2026

Uh oh!

thomas-zahner Mar 3, 2026

Uh oh!

thomas-zahner commented Mar 2, 2026

Uh oh!

katrinafyi left a comment

Uh oh!

thomas-zahner commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

thomas-zahner commented Mar 2, 2026

Uh oh!

katrinafyi left a comment

Choose a reason for hiding this comment

Uh oh!

katrinafyi Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-zahner Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

katrinafyi Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-zahner Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thomas-zahner commented Mar 2, 2026

Uh oh!

katrinafyi left a comment

Choose a reason for hiding this comment

Uh oh!

thomas-zahner commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thomas-zahner Mar 2, 2026 •

edited

Loading