-
Notifications
You must be signed in to change notification settings - Fork 399
Improve encode c api #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve encode c api #145
Conversation
The gist is that we currently have to read the whole file into contiguous memory before handing it off the API. *Additionally*, as a hard requirement for wirehair, the *compressed* file needs to exist in contiguous memory. You might think (as I did) that ~15 MB would not be enough to run afoul of the browser memory constraints, but apparently this is not so. We can't do anything about the second (post compressed) requirement, but the first can be relaxed if the API accepts files in chunks. So: * cimbare_init_encode() <-- to initialize the encode. Provide filename, set encode_id (or -1 to auto-increment) * cimbare_encode_bufsize() <-- the desired chunk size. Currently a constant. * cimbare_encode() <-- to provide data. When the data size != the chunk size, the encode is ready. (this also means we can still achieve the old behavior of providing the whole file at once) cimbar_encode(null, 0) can act as a flush() to make sure the encode is ready -- though this is only relevant when the filesize is a multiple of the chunk size.
…file Should help avert (some) OOM errors, hopefully
| } | ||
| else if (res == 1 and cimbare_encode(nullptr, 0) != 0) // fallback finish encode | ||
| { | ||
| std::cerr << "failed to encode for file" << filename << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example flow.
| ++_encodeId; // increment _encodeId every time we change files | ||
| else | ||
| enc.set_encode_id(static_cast<uint8_t>(encode_id)); | ||
| _encodeId = encode_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be the strangest side effect of this change -- _encodeId will now persist if you override it, so you can change the auto-increment starting point. You probably shouldn't, though. The right behavior will generally be to use -1 (for auto-increment).
(should probably define a constant for that)
| std::optional<cv::Mat> _next; | ||
|
|
||
| // compressing the file | ||
| std::unique_ptr<cimbar::zstd_compressor<std::stringstream>> _comp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside here is that the compression is now stateful. But it must be, given that want to persist state across calls.
| _comp->set_compression_level(_compressionLevel); | ||
|
|
||
| if (fnsize > 0 and filename != nullptr) | ||
| _comp->write_header(filename, fnsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this "new" code is refactored from Encoder::create_fountain_encoder() -- that may go away at some point, now that it seems redundant.
| _comp->pad(fountainChunkSize - compressedSize + 1); | ||
|
|
||
| // create the encoder stream | ||
| _fes = fountain_encoder_stream::create(*_comp, fountainChunkSize, _encodeId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More from Encoder::create_fountain_encoder()
|
|
||
| // create the encoder stream | ||
| _fes = fountain_encoder_stream::create(*_comp, fountainChunkSize, _encodeId); | ||
| _comp.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we return 0 from cimbare_encode, the next call to it will return -1 (as if we never called cimbare_init_encode)
| std::string filename = "/tmp/foobar-c语言版.txt"; | ||
| assertEquals( 0, cimbare_encode(reinterpret_cast<unsigned char*>(contents.data()), contents.size(), filename.data(), filename.size(), 100) ); | ||
| assertEquals( 0, cimbare_init_encode(filename.data(), filename.size(), 100) ); | ||
| assertEquals( 0, cimbare_encode(reinterpret_cast<unsigned char*>(contents.data()), contents.size()) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another usage example. In this case, we encode the whole file at once, as the old api used to require.
| if (datalen > 0) { | ||
| // copy to wasm buff and write | ||
| const uint8View = new Uint8Array(event.target.result); | ||
| _compressBuff.set(uint8View); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads awkwardly due to JS apis being awkward. But all we're doing here is copying bytes somewhere the wasm code can work with them.
| // this null call is functionally a flush() | ||
| // so a no-op, unless it isn't | ||
| const nullBuff = new Uint8Array(Module.HEAPU8.buffer, _compressBuff.byteOffset, 0); | ||
| Main.encode_bytes(nullBuff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the flush/no-op path for cimbare_encode()
Let's have cimbare_init_encode() reset the fountain stream as well. If we don't, there's a weird (failure path) state where it looks like we've switched the encoder to a new file, but every encoded frame will be for the previous file (since the fountain stream setup occurs during the final cimbare_encode() call.
| assertEquals( 0, cimbare_encode(nullptr, 0) ); | ||
| assertEquals( 1, cimbare_next_frame() ); | ||
|
|
||
| assertEquals( -1, cimbare_encode(nullptr, 0) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Demonstrating the corner case.
The flip side of #144: complex-ify the encoder. 🫠
We (users of the cimbar.js encoder, including me) have run into situations where the browser hits an out of memory (OOM) error on large input files. Example issues:
#141
#132
#118
#96
While I can't solve all OOM issues -- wirehair needs the compressed representation stored in contiguous memory -- writing the decoder (#134) reminded/convinced me that we can do better when loading the uncompressed data for encoding.
So:
The new API surface:
cimbare_init_encode(filename, encode_id)-> starts an encode, waits for datacimbare_encode(data)-> provides data. When the size of data provided is ==cimbare_encode_bufsize(), we treat this as an incremental write and don't finalize anything (and return1). Otherwise, we consider the write complete.cimbare_encode(nullptr, 0)therefore functions as a sort of "flush()" operation if you want to make sure the data is ready.0, we can start doing ourcimbare_next_frame()calls