feat: support multipart upload#52
Conversation
fix: upload >5GB artefacts
apdavison
left a comment
There was a problem hiding this comment.
If I understand correctly, upload can be resumed manually by the user. I think this is fine for now, but it would be nice to handle possible temporary network failures by retrying chunks automatically.
Please also add a test using mocks, to avoid the reduction in test coverage.
| from tqdm import tqdm | ||
| from typing import Union | ||
|
|
||
| MULTIPART_CHUNK_SIZE = 10 * 1024 * 1024 # 10 MB |
There was a problem hiding this comment.
Maybe document the reasons for these choices? I think you said the limit was 5 GB, so why a threshold of 1 GB?
There was a problem hiding this comment.
^ I am still trying to get documentation on the 5GB upload limit. I only found out about it by trial and error.
There was a problem hiding this comment.
They are at the moment undocumented.
I have made it clear of such in the newest commit.
I also made it so that their behavior can be customized based on envvar
|
|
||
| def _get_filesize(self, filelike: Union[str, IOBase]) -> int: | ||
| if isinstance(filelike, str): | ||
| with open(filelike, "rb") as fp: |
There was a problem hiding this comment.
Can't you get the filesize from the operating system when filelike is a path? e.g., os.path.getsize(filelike)
There was a problem hiding this comment.
I want to ensure that this method also works for BufferedIO . I (e.g. from stdin). os.path.getsize will likely fail on BufferedIO, I understand?
There was a problem hiding this comment.
won't isinstance(filelike, str) return False if filelike is a BufferedIO object?
There was a problem hiding this comment.
Good call. I fixed it in the newest commit.
doc: add doc to upper chunk size and threshold fix: use os.path.getsize instead
edit: done |
Up until this point, artefact upload >5G will fail due to undocumented gateway settings. Additionally, uploading large artefacts run into the risk of interruption.
This PR adds multipart upload. upload will automatically default to multipart upload if the size > 1GB.