@@ -431,6 +431,49 @@ In addition to [`upload_file`] and [`upload_folder`], the following functions al
431431
432432For more detailed information, take a look at the [ ` HfApi ` ] reference.
433433
434+ ### Preupload LFS files before commit
435+
436+ In some cases, you might want to upload huge files to S3 ** before** making the commit call. For example, if you are
437+ committing a dataset in several shards that are generated in-memory, you would need to upload the shards one by one
438+ to avoid a out-of-memory issue. A solution is to upload each shard as a separate commit on the repo. While being
439+ perfectly valid, this solution has the drawback of potentially messing the git history by generating a tens of commits.
440+ To overcome this issue, you can upload your files one by one to S3 and then create a single commit at the end. This
441+ is possible using [ ` preupload_lfs_files ` ] in combination with [ ` create_commit ` ] .
442+
443+ <Tip warning ={true} >
444+
445+ This is a power-user method. Directly using [ ` upload_file ` ] , [ ` upload_folder ` ] or [ ` create_commit ` ] instead of handling
446+ the low-level logic of pre-uploading files is the way to go in the vast majority of cases. If you have a question,
447+ feel free to ping us on our Discord or in a Github issue.
448+
449+ </Tip >
450+
451+ Here is a simple example illustrating how to pre-upload files:
452+
453+ ``` py
454+ >> > from huggingface_hub import CommitOperationAdd, preupload_lfs_files, create_commit, create_repo
455+
456+ >> > repo_id = create_repo(" test_preupload" ).repo_id
457+
458+ >> > operations = [] # List of all `CommitOperationAdd` objects that will be generated
459+ >> > for i in range (5 ):
460+ ... content = ... # generate binary content
461+ ... addition = CommitOperationAdd(path_in_repo = f " shard_ { i} _of_5.bin " , path_or_fileobj = content)
462+ ... preupload_lfs_files(repo_id, additions = [addition])
463+ ... operations.append(addition)
464+
465+ # Create commit
466+ >> > create_commit(repo_id, operations = operations, commit_message = " Commit all shards" )
467+ ```
468+
469+ First, we create the [ ` CommitOperationAdd ` ] objects one by one. In a real-world example, those would contain the
470+ generated shards. Each file is uploaded before generating the next one. During the [ ` preupload_lfs_files ` ] step, ** the
471+ ` CommitOperationAdd ` object is mutated** . You should only use it to pass it to directly to [ ` create_commit ` ] . The main
472+ update of the object is that ** the binary content is removed** from it, meaning that it will be garbage-collected if
473+ you don't store another reference to it. This is expected as we don't want to keep in memory the content that is
474+ already uploaded. Finally we create the commit by passing all the operations to [ ` create_commit ` ] . You can pass
475+ additional operations (add, delete or copy) that have not being processed yet and they will be handled correctly.
476+
434477## Tips and tricks for large uploads
435478
436479There are some limitations to be aware of when dealing with a large amount of data in your repo. Given the time it takes to stream the data,
0 commit comments