send HTTP caching headers for index pages to further reduce bandwidth usage#12257
send HTTP caching headers for index pages to further reduce bandwidth usage#12257cosmicexplorer wants to merge 5 commits intopypa:mainfrom
Conversation
011f02e to
b658989
Compare
|
@ewdurbin @uranusjr (not sure who to tag): this change adds |
|
I'm actually going to take out the interpreter compatibility caching, since that part adds significant complexity without any effect on network bandwidth usage. |
4e361c4 to
2c0405b
Compare
5830ea0 to
28d65ce
Compare
Primary bandwidth concern for PyPI is files.pythonhosted.org, by about 15 times. But everything is marginal so this would have a consequential impact over time. Nice stuff! |
2ecc6c4 to
3987be0
Compare
d935efa to
5cc8a36
Compare
5cc8a36 to
03490f9
Compare
6dd7837 to
5c40e19
Compare
|
This looks reasonable to me, assuming we accept the dependant PRs. (Still some discussion needed on #12186, it seems.) |
5c40e19 to
f28ecfd
Compare
When performing `install --dry-run` and PEP 658 .metadata files are available to guide the resolve, do not download the associated wheels. Rather use the distribution information directly from the .metadata files when reporting the results on the CLI and in the --report file. - describe the new --dry-run behavior - finalize linked requirements immediately after resolve - introduce is_concrete - funnel InstalledDistribution through _get_prepared_distribution() too - add test for new install --dry-run functionality (no downloading)
a6a0279 to
9cc7eba
Compare
- catch an exception when parsing metadata which only occurs in CI - handle --no-cache-dir - call os.makedirs() before writing to cache too - catch InvalidSchema when attempting git urls with BatchDownloader - fix other test failures - reuse should_cache(req) logic - gzip compress link metadata for a slight reduction in disk space - only cache built sdists - don't check should_cache() when fetching - cache lazy wheel dists - add news - turn debug logs in fetching from cache into exceptions - use scandir over listdir when searching normal wheel cache - handle metadata email parsing errors - correctly handle mutable cached requirement - use bz2 over gzip for an extremely slight improvement in disk usage - handle new google_paste encoding breakage
7c9a0d8 to
cbff460
Compare
- pipe in headers arg - provide full context in Link.comes_from - pull in etag and date and cache the outputs - handle --no-cache-dir - add NEWS - remove quotes from etag and use binary checksum to save a few bytes - parse http modified date to compress the cached representation - fix cache-control clobbering
This PR is on top of #12256, see the
+316/-36diff against it at https://github.com/cosmicexplorer/pip/compare/link-metadata-cache...cosmicexplorer:pip:link-parsing-cache?expand=1.Background: Learning More about HTTP Requests
After taking up a suggestion from @dholth in #12208 to consider handling a
304 Not Modifiedresponse in HTTP requests, I began to consider whether we could make use of HTTP caching headers to further reduce the time we wait for the network, without reintroducing the delays to see new package uploads described in #5670.Proposal: Send HTTP Caching Headers and Record
ETagThis change records the
ETagandDateheaders from the HTTP response, then sets theIf-None-MatchandIf-Modified-Sinceheaders on future requests against project pages (e.g. https://pypi.org/simple/tensorflow). This allows the server to respond with a zero-length304 Not Modifiedinstead of a several-hundred KB HTML page:Result: Slight Performance Improvement
Recording these HTTP headers adds only ~3KB of disk space after a large resolve:
It has only a very slight (3.4%) performance benefit on top of #12256, converting a 6.1 second resolve to 5.9 seconds:
But more importantly, as described above, it avoids making multiple ~600KB requests against pypi on each resolve.
TODO