Skip to content

Feature/streaming encryption#555

Merged
jbygdell merged 13 commits intomainfrom
feature/streaming-encryption
Aug 12, 2025
Merged

Feature/streaming encryption#555
jbygdell merged 13 commits intomainfrom
feature/streaming-encryption

Conversation

@jbygdell
Copy link
Copy Markdown
Contributor

@jbygdell jbygdell commented Jul 21, 2025

Related issue(s) and PR(s)
This PR closes #473 .

Description
This PR creates a streaming encryption function that is used when uploading files with the -encrypt-with-key flag set. The files are encrypted in memory during the upload process using a 1MB blocking ring buffer.
The ring buffer will block reading of data if it is empty and will at most contain 1MB of data during the upload process in order not to consume all system memory when uploading large files. Reads and writes are matched to the buffer once it is full i.e. for each chunk that is read one can be written to the buffer.

The PR also contains a bunch of linter fixes that needed to be fixed.

How to test

This uses a blocking ring buffer in order to prevent the file to be read fully into memory and thus fail due to out of memory on large files.
@jbygdell jbygdell requested review from aaperis and pahatz July 21, 2025 09:00
@jbygdell jbygdell self-assigned this Jul 21, 2025
@jbygdell jbygdell added the go Pull requests that update Go code label Jul 21, 2025
@jbygdell jbygdell enabled auto-merge July 22, 2025 13:33
Copy link
Copy Markdown
Contributor

@pahatz pahatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encryption itself and the upload seem to work great!
I do have some concerns though.

  1. The checksums that we produce will be difficult to use in the current naming format.
    For example, I ran a recursive upload of a folder with two files and this produces the following files in the directory of the executable:
-rw-------. 1 pan  pan      1111 Jul 23 12:55 checksum_unencrypted.sha256
-rw-------. 1 pan  pan       663 Jul 23 12:55 checksum_unencrypted.md5
-rw-------. 1 pan  pan      1221 Jul 23 12:55 checksum_encrypted.sha256
-rw-------. 1 pan  pan       773 Jul 23 12:55 checksum_encrypted.md5

There should be some naming convention to be able to link them to the original and maybe place them in the same directory.

  1. The progress bar seems to be broken with this implementation.

@jbygdell
Copy link
Copy Markdown
Contributor Author

  1. The checksums that we produce will be difficult to use in the current naming format.
    For example, I ran a recursive upload of a folder with some files and this produces the following files in the directory of the executable:

Nothing has changed there with regards to how it currently is done.
The files contain the relevant information i.e. hash and file name.

  1. The progress bar seems to be broken with this implementation.
    In what sense is it broken?

I get the same visual appearance when uploading both for encrypted and unencrypted:

File README.md: uploading 0.0 b / 20.2 KiB [------]

@jbygdell jbygdell force-pushed the feature/streaming-encryption branch from ea57576 to 772c63e Compare July 23, 2025 12:36
@jbygdell
Copy link
Copy Markdown
Contributor Author

@pahatz I think I managed to fix the visuals now.

@jbygdell jbygdell force-pushed the feature/streaming-encryption branch from 772c63e to f0d1e24 Compare July 23, 2025 12:38
@pahatz
Copy link
Copy Markdown
Contributor

pahatz commented Jul 23, 2025

Fair enough, I hadn't checked before how our checksums look.
The progress bar looks good now, great work!

Copy link
Copy Markdown
Contributor

@nanjiangshu nanjiangshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. But I have two comments, the first one must be fixed and the second one is better to fixed.

  1. The uploaded c4gh files have only 124 Bytes, that is, only with the header, when the files to be uploaded with -encrypt-with-key is less than 1Mb.

  2. When there are encrypted version of files in the folder, e.g. in the following example, I have a folder called test_dir2 and it contains both file_100.txt and file_100.txt.c4gh. The command exit with the error Error: aborting, file is already encrypted.

It is better to show also the name of the file that is already encrypted is file_100.txt.c4gh. Otherwise, it looks like the process is aborted because of file_100.txt.c4gh` but it's not.

Previously files with .c4gh extension will be ignored, but now any files in the folder will be processed, even those with .c4gh extension. It is better to point out the exact file that caused the abortion.

$ ../sda-cli -config s3cmd-bp-staging.conf upload -encrypt-with-key mykey.pub.pem  -r test_dir2  -targetDir test_dir2_try2
Remote server (host_base): https://staging-inbox.bp.nbis.se
File file_100.txt: uploading 124.0 b / 292.0 b [========================>-----------------------------------]  42 %
Error: aborting, file is already encrypted

@jbygdell jbygdell force-pushed the feature/streaming-encryption branch 7 times, most recently from 2a118ff to c90e07f Compare August 11, 2025 08:35
@jbygdell jbygdell force-pushed the feature/streaming-encryption branch 6 times, most recently from 6529745 to 7ca7729 Compare August 11, 2025 12:36
@jbygdell jbygdell requested a review from nanjiangshu August 11, 2025 12:36
@jbygdell jbygdell force-pushed the feature/streaming-encryption branch from 7ca7729 to 5b7f0e5 Compare August 11, 2025 12:39
@jbygdell
Copy link
Copy Markdown
Contributor Author

2. When there are encrypted version of files in the folder, e.g. in the following example, I have a folder called test_dir2 and it contains both file_100.txt and file_100.txt.c4gh. The command exit with the error Error: aborting, file is already encrypted.

While not part of the change originally it is easy to fix.

@jbygdell
Copy link
Copy Markdown
Contributor Author

  1. The uploaded c4gh files have only 124 Bytes, that is, only with the header, when the files to be uploaded with -encrypt-with-key is less than 1Mb.

Related to an unclosed reader/Writer. Extra tests has been added

@jbygdell jbygdell requested a review from kostas-kou August 11, 2025 13:56
Comment thread encrypt/encrypt.go Outdated
@jbygdell jbygdell force-pushed the feature/streaming-encryption branch from 5b7f0e5 to 9fa1133 Compare August 12, 2025 06:39
@jbygdell jbygdell requested review from a team and nanjiangshu August 12, 2025 06:39
Copy link
Copy Markdown
Contributor

@nanjiangshu nanjiangshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jbygdell jbygdell added this pull request to the merge queue Aug 12, 2025
Merged via the queue into main with commit 718f055 Aug 12, 2025
6 checks passed
@jbygdell jbygdell deleted the feature/streaming-encryption branch August 12, 2025 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[upload] perform streamed encryption while uploading

3 participants