Skip to content

feat: support multipart upload#52

Merged
apdavison merged 4 commits into
masterfrom
feat/multipartUpload
May 25, 2026
Merged

feat: support multipart upload#52
apdavison merged 4 commits into
masterfrom
feat/multipartUpload

Conversation

@xgui3783
Copy link
Copy Markdown
Collaborator

Up until this point, artefact upload >5G will fail due to undocumented gateway settings. Additionally, uploading large artefacts run into the risk of interruption.

This PR adds multipart upload. upload will automatically default to multipart upload if the size > 1GB.

fix: upload >5GB artefacts
@xgui3783 xgui3783 requested a review from apdavison May 22, 2026 12:53
Copy link
Copy Markdown
Member

@apdavison apdavison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, upload can be resumed manually by the user. I think this is fine for now, but it would be nice to handle possible temporary network failures by retrying chunks automatically.

Please also add a test using mocks, to avoid the reduction in test coverage.

Comment thread ebrains_drive/bucket.py Outdated
from tqdm import tqdm
from typing import Union

MULTIPART_CHUNK_SIZE = 10 * 1024 * 1024 # 10 MB
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe document the reasons for these choices? I think you said the limit was 5 GB, so why a threshold of 1 GB?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ I am still trying to get documentation on the 5GB upload limit. I only found out about it by trial and error.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are at the moment undocumented.

I have made it clear of such in the newest commit.

I also made it so that their behavior can be customized based on envvar

Comment thread ebrains_drive/bucket.py Outdated

def _get_filesize(self, filelike: Union[str, IOBase]) -> int:
if isinstance(filelike, str):
with open(filelike, "rb") as fp:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you get the filesize from the operating system when filelike is a path? e.g., os.path.getsize(filelike)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to ensure that this method also works for BufferedIO . I (e.g. from stdin). os.path.getsize will likely fail on BufferedIO, I understand?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't isinstance(filelike, str) return False if filelike is a BufferedIO object?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I fixed it in the newest commit.

doc: add doc to upper chunk size and threshold
fix: use os.path.getsize instead
@xgui3783
Copy link
Copy Markdown
Collaborator Author

xgui3783 commented May 25, 2026

Please also add a test using mocks, to avoid the reduction in test coverage.

I will endavour to do that tomorrow (or later today)

edit: done

@apdavison apdavison merged commit fa59fe1 into master May 25, 2026
10 checks passed
@xgui3783 xgui3783 deleted the feat/multipartUpload branch May 25, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants