Improve encode c api #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

sz3 merged 3 commits into master from improve-encode-capi

Sep 29, 2025

Owner

sz3 commented Sep 28, 2025 •

edited

Loading

The flip side of #144: complex-ify the encoder. 🫠

We (users of the cimbar.js encoder, including me) have run into situations where the browser hits an out of memory (OOM) error on large input files. Example issues:
#141
#132
#118
#96

While I can't solve all OOM issues -- wirehair needs the compressed representation stored in contiguous memory -- writing the decoder (#134) reminded/convinced me that we can do better when loading the uncompressed data for encoding.

So:

we're now loading input files in chunks
this requires us to change the c api for encoding
as a result, OOM errors in the browser should be less frequent. I won't say solved; but, better.

The new API surface:

cimbare_init_encode(filename, encode_id) -> starts an encode, waits for data
cimbare_encode(data) -> provides data. When the size of data provided is == cimbare_encode_bufsize(), we treat this as an incremental write and don't finalize anything (and return 1). Otherwise, we consider the write complete.
cimbare_encode(nullptr, 0) therefore functions as a sort of "flush()" operation if you want to make sure the data is ready.
when cimbare_encode() returns 0, we can start doing our cimbare_next_frame() calls

sz3 added 2 commits

September 28, 2025 08:45


          Reorganizing the encoder c api to (hopefully) lessen OOM errors

00a7f47

The gist is that we currently have to read the whole file into
contiguous memory before handing it off the API. *Additionally*, as a
hard requirement for wirehair, the *compressed* file needs to exist in
contiguous memory.

You might think (as I did) that ~15 MB would not be enough to run afoul
of the browser memory constraints, but apparently this is not so.

We can't do anything about the second (post compressed) requirement, but
the first can be relaxed if the API accepts files in chunks. So:

* cimbare_init_encode() <-- to initialize the encode. Provide filename,
set encode_id (or -1 to auto-increment)
* cimbare_encode_bufsize() <-- the desired chunk size. Currently a
constant.
* cimbare_encode() <-- to provide data. When the data size != the chunk
size, the encode is ready. (this also means we can still achieve the old
behavior of providing the whole file at once)

cimbar_encode(null, 0) can act as a flush() to make sure the encode is
ready -- though this is only relevant when the filesize is a multiple of
the chunk size.


          JS encoder: split file encode into multiple calls, and loop over the …

7793a33

…file

Should help avert (some) OOM errors, hopefully

sz3 commented

View reviewed changes

src/exe/cimbar_send/send.cpp

    
              				}

              				else if (res == 1 and cimbare_encode(nullptr, 0) != 0) // fallback finish encode

              				{

              					std::cerr << "failed to encode for file" << filename << std::endl;

Owner Author

sz3 Sep 28, 2025

Example flow.

sz3 commented

View reviewed changes

src/lib/cimbar_js/cimbar_js.cpp

    
              		++_encodeId; // increment _encodeId every time we change files

              	else

              		enc.set_encode_id(static_cast<uint8_t>(encode_id));

              		_encodeId = encode_id;

Owner Author

sz3 Sep 28, 2025

This might be the strangest side effect of this change -- _encodeId will now persist if you override it, so you can change the auto-increment starting point. You probably shouldn't, though. The right behavior will generally be to use -1 (for auto-increment).

(should probably define a constant for that)

sz3 commented

View reviewed changes

src/lib/cimbar_js/cimbar_js.cpp

    
              	std::optional<cv::Mat> _next;

              	// compressing the file

              	std::unique_ptr<cimbar::zstd_compressor<std::stringstream>> _comp;

Owner Author

sz3 Sep 29, 2025

The downside here is that the compression is now stateful. But it must be, given that want to persist state across calls.

sz3 commented

View reviewed changes

src/lib/cimbar_js/cimbar_js.cpp

    
              	_comp->set_compression_level(_compressionLevel);

              	if (fnsize > 0 and filename != nullptr)

              		_comp->write_header(filename, fnsize);

Owner Author

sz3 Sep 29, 2025

All this "new" code is refactored from Encoder::create_fountain_encoder() -- that may go away at some point, now that it seems redundant.

sz3 commented

View reviewed changes

src/lib/cimbar_js/cimbar_js.cpp

    
              		_comp->pad(fountainChunkSize - compressedSize + 1);

              	// create the encoder stream

              	_fes = fountain_encoder_stream::create(*_comp, fountainChunkSize, _encodeId);

Owner Author

sz3 Sep 29, 2025

More from Encoder::create_fountain_encoder()

sz3 commented

View reviewed changes

src/lib/cimbar_js/cimbar_js.cpp

    
              	// create the encoder stream

              	_fes = fountain_encoder_stream::create(*_comp, fountainChunkSize, _encodeId);

              	_comp.reset();

Owner Author

sz3 Sep 29, 2025

After we return 0 from cimbare_encode, the next call to it will return -1 (as if we never called cimbare_init_encode)

sz3 commented

View reviewed changes

src/lib/cimbar_js/test/cimbar_jsTest.cpp

    
              	std::string filename = "/tmp/foobar-c语言版.txt";

              	assertEquals( 0, cimbare_encode(reinterpret_cast<unsigned char*>(contents.data()), contents.size(), filename.data(), filename.size(), 100) );

              	assertEquals( 0, cimbare_init_encode(filename.data(), filename.size(), 100) );

              	assertEquals( 0, cimbare_encode(reinterpret_cast<unsigned char*>(contents.data()), contents.size()) );

Owner Author

sz3 Sep 29, 2025

Another usage example. In this case, we encode the whole file at once, as the old api used to require.

sz3 commented

View reviewed changes

web/main.js

    
                    if (datalen > 0) {

                      // copy to wasm buff and write

                      const uint8View = new Uint8Array(event.target.result);

                      _compressBuff.set(uint8View);

Owner Author

sz3 Sep 29, 2025

This reads awkwardly due to JS apis being awkward. But all we're doing here is copying bytes somewhere the wasm code can work with them.

sz3 commented

View reviewed changes

web/main.js

    
                      // this null call is functionally a flush()

                      // so a no-op, unless it isn't

                      const nullBuff = new Uint8Array(Module.HEAPU8.buffer, _compressBuff.byteOffset, 0);

                      Main.encode_bytes(nullBuff);

Owner Author

sz3 Sep 29, 2025

Using the flush/no-op path for cimbare_encode()


          Add a test for the no-op/flush cimbare_encode() usage + ...

4086d16

Let's have cimbare_init_encode() reset the fountain stream as well.

If we don't, there's a weird (failure path) state where it looks like
we've switched the encoder to a new file, but every encoded frame will
be for the previous file (since the fountain stream setup occurs during
the final cimbare_encode() call.

sz3 commented

View reviewed changes

src/lib/cimbar_js/test/cimbar_jsTest.cpp

    
              	assertEquals( 0, cimbare_encode(nullptr, 0) );

              	assertEquals( 1, cimbare_next_frame() );

              	assertEquals( -1, cimbare_encode(nullptr, 0) );

Owner Author

sz3 Sep 29, 2025

Demonstrating the corner case.

sz3 merged commit 98fca1c into master

11 checks passed

sz3 deleted the improve-encode-capi branch

September 29, 2025 03:18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet