Conversation
This uses a blocking ring buffer in order to prevent the file to be read fully into memory and thus fail due to out of memory on large files.
There was a problem hiding this comment.
The encryption itself and the upload seem to work great!
I do have some concerns though.
- The checksums that we produce will be difficult to use in the current naming format.
For example, I ran a recursive upload of a folder with two files and this produces the following files in the directory of the executable:
-rw-------. 1 pan pan 1111 Jul 23 12:55 checksum_unencrypted.sha256
-rw-------. 1 pan pan 663 Jul 23 12:55 checksum_unencrypted.md5
-rw-------. 1 pan pan 1221 Jul 23 12:55 checksum_encrypted.sha256
-rw-------. 1 pan pan 773 Jul 23 12:55 checksum_encrypted.md5
There should be some naming convention to be able to link them to the original and maybe place them in the same directory.
- The progress bar seems to be broken with this implementation.
Nothing has changed there with regards to how it currently is done.
I get the same visual appearance when uploading both for encrypted and unencrypted: |
ea57576 to
772c63e
Compare
|
@pahatz I think I managed to fix the visuals now. |
772c63e to
f0d1e24
Compare
|
Fair enough, I hadn't checked before how our checksums look. |
nanjiangshu
left a comment
There was a problem hiding this comment.
Looks good in general. But I have two comments, the first one must be fixed and the second one is better to fixed.
-
The uploaded c4gh files have only 124 Bytes, that is, only with the header, when the files to be uploaded with
-encrypt-with-keyis less than 1Mb. -
When there are encrypted version of files in the folder, e.g. in the following example, I have a folder called
test_dir2and it contains bothfile_100.txtandfile_100.txt.c4gh. The command exit with the errorError: aborting, file is already encrypted.
It is better to show also the name of the file that is already encrypted is file_100.txt.c4gh. Otherwise, it looks like the process is aborted because of file_100.txt.c4gh` but it's not.
Previously files with .c4gh extension will be ignored, but now any files in the folder will be processed, even those with .c4gh extension. It is better to point out the exact file that caused the abortion.
$ ../sda-cli -config s3cmd-bp-staging.conf upload -encrypt-with-key mykey.pub.pem -r test_dir2 -targetDir test_dir2_try2
Remote server (host_base): https://staging-inbox.bp.nbis.se
File file_100.txt: uploading 124.0 b / 292.0 b [========================>-----------------------------------] 42 %
Error: aborting, file is already encrypted
2a118ff to
c90e07f
Compare
6529745 to
7ca7729
Compare
7ca7729 to
5b7f0e5
Compare
While not part of the change originally it is easy to fix. |
Related to an unclosed reader/Writer. Extra tests has been added |
5b7f0e5 to
9fa1133
Compare
Related issue(s) and PR(s)
This PR closes #473 .
Description
This PR creates a streaming encryption function that is used when uploading files with the
-encrypt-with-keyflag set. The files are encrypted in memory during the upload process using a 1MB blocking ring buffer.The ring buffer will block reading of data if it is empty and will at most contain 1MB of data during the upload process in order not to consume all system memory when uploading large files. Reads and writes are matched to the buffer once it is full i.e. for each chunk that is read one can be written to the buffer.
The PR also contains a bunch of linter fixes that needed to be fixed.
How to test