Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
3b9e5e3
Prototype binary encoding.
juj Mar 5, 2024
059699a
Fix Closure minification around binaryDecode() function.
juj Mar 5, 2024
735af53
Make binary encoding work in default runtime.
juj Mar 6, 2024
ed9b7e1
Default enable SINGLE_FILE_BINARY_ENCODE
juj Mar 6, 2024
926fde4
Remove old code
juj Mar 6, 2024
85d6ea4
Cleanup encoding code
juj Mar 6, 2024
e199eb0
Flake
juj Mar 6, 2024
23007c1
Flake
juj Mar 6, 2024
451fc77
tools\maint\update_settings_docs.py
juj Mar 6, 2024
cdd1719
Simplify
juj Mar 6, 2024
d4c8067
Fix Closure invocation.
juj Mar 6, 2024
6435c68
Rebaseline code size tests.
juj Mar 6, 2024
426c9d2
Fix typo
juj Mar 6, 2024
ca63b7b
Fix browser.test_single_file_worker_js
juj Mar 6, 2024
a27937f
Fix browser.test_modularize
juj Mar 6, 2024
7eb94f2
Proper ifdef gate.
juj Mar 6, 2024
66db56c
Remove code duplication.
juj Mar 6, 2024
46fa3b8
Resolve todo
juj Mar 6, 2024
0fad312
eslint
juj Mar 6, 2024
bb8f0f5
Disable SINGLE_FILE_BINARY_ENCODE by default
juj Mar 7, 2024
f8d9641
Fix refactoring.
juj Mar 8, 2024
0317531
Include test for -sSINGLE_FILE_BINARY_ENCODE=0/1 modes.
juj Mar 8, 2024
3ac3aab
Address review
juj Mar 26, 2024
514759b
Merge remote-tracking branch 'origin/main' into binary_encode
juj Mar 26, 2024
77badc9
Merge remote-tracking branch 'origin/main' into binary_encode
juj Aug 27, 2024
ea729f0
Fix merge
juj Aug 27, 2024
1b163d2
Clean up merge
juj Aug 27, 2024
09c5aa8
Run tools\maint\update_settings_docs.py
juj Aug 27, 2024
da757b9
Rebaseline code size tests
juj Aug 27, 2024
9abe835
Add new test for WebGL 2 code size in binary encoded singlefile mode.…
juj Aug 27, 2024
b7e9a2c
Fix call to binaryDecode() after merge
juj Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions site/source/docs/tools_reference/settings_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2857,6 +2857,18 @@ then you can safely ignore this warning.

Default value: false

.. _single_file_binary_encode:

SINGLE_FILE_BINARY_ENCODE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this SUPPORT_UTF8_EMBEDDING to match the existing SUPPORT_BASE64_EMBEDDING?

I do wish we could make this change without adding yet another setting, but I guess its not reasonable to make it unconditional. Do you think we could make it unconditional at some point in the future?

=========================

If true, binary Wasm content is encoded using a custom UTF-8 embedding
instead of base64. This generates smaller binary.
Set this to false to revert back to earlier base64 encoding if you run into
issues with the binary encoding. (and please let us know of any such issues)

Default value: true

.. _auto_js_libraries:

AUTO_JS_LIBRARIES
Expand Down
10 changes: 10 additions & 0 deletions src/binaryDecode.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// Prevent Closure from minifying the binaryDecode() function, or otherwise
// Closure may analyze through the WASM_BINARY_DATA placeholder string into this
// function, leading into incorrect results.
/** @noinline */
function binaryDecode(bin) {
for(var i = 0, l = bin.length, o = new Uint8Array(l); i < l; ++i) {
o[i] = bin.charCodeAt(i) - 1;
}
return o;
}
14 changes: 14 additions & 0 deletions src/preamble.js
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,10 @@ function instrumentWasmTableWithAbort() {
}
#endif

#if SINGLE_FILE && SINGLE_FILE_BINARY_ENCODE && !WASM2JS
#include "binaryDecode.js"
#endif

function findWasmBinary() {
#if EXPORT_ES6 && USE_ES6_IMPORT_META && !SINGLE_FILE && !AUDIO_WORKLET
if (Module['locateFile']) {
Expand All @@ -613,7 +617,13 @@ function findWasmBinary() {
return locateFile(f);
}
#endif

#if SINGLE_FILE && SINGLE_FILE_BINARY_ENCODE && !WASM2JS
return binaryDecode(f);
#else
return f;
#endif

#if EXPORT_ES6 && USE_ES6_IMPORT_META && !SINGLE_FILE && !AUDIO_WORKLET // In single-file mode, repeating WASM_BINARY_FILE would emit the contents again. For an Audio Worklet, we cannot use `new URL()`.
}
#if ENVIRONMENT_MAY_BE_SHELL
Expand All @@ -628,6 +638,9 @@ function findWasmBinary() {
var wasmBinaryFile;

function getBinarySync(file) {
#if SINGLE_FILE && SINGLE_FILE_BINARY_ENCODE
return file;
#else
if (file == wasmBinaryFile && wasmBinary) {
return new Uint8Array(wasmBinary);
}
Expand All @@ -645,6 +658,7 @@ function getBinarySync(file) {
#else
throw 'sync fetching of the wasm failed: you can preload it to Module["wasmBinary"] manually, or emcc.py will do that for you when generating HTML (but not JS)';
#endif
#endif
}

function getBinaryPromise(binaryFile) {
Expand Down
7 changes: 7 additions & 0 deletions src/preamble_minimal.js
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,17 @@ if (Module['doWasm2JS']) {
#endif

#if SINGLE_FILE && WASM == 1 && !WASM2JS

#if SINGLE_FILE_BINARY_ENCODE
#include "binaryDecode.js"
Module['wasm'] = binaryDecode('<<< WASM_BINARY_DATA >>>');
#else
#include "base64Decode.js"
Module['wasm'] = base64Decode('<<< WASM_BINARY_DATA >>>');
#endif

#endif

var HEAP8, HEAP16, HEAP32, HEAPU8, HEAPU16, HEAPU32, HEAPF32, HEAPF64,
#if WASM_BIGINT
HEAP64, HEAPU64,
Expand Down
9 changes: 9 additions & 0 deletions src/proxyClient.js
Original file line number Diff line number Diff line change
Expand Up @@ -129,13 +129,18 @@ var SUPPORT_BASE64_EMBEDDING;
var filename;
filename ||= '<<< filename >>>';

#if SINGLE_FILE && SINGLE_FILE_BINARY_ENCODE
#include "binaryDecode.js"
var workerURL = URL.createObjectURL(new Blob([binaryDecode(filename)], {type: 'application/javascript'}));
#else
var workerURL = filename;
if (SUPPORT_BASE64_EMBEDDING) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this change but this line looks very strange. Shouldn't this be #if SUPPORT_BASE64_EMBEDDING?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var fileBytes = tryParseAsDataURI(filename);
if (fileBytes) {
workerURL = URL.createObjectURL(new Blob([fileBytes], {type: 'application/javascript'}));
}
}
#endif
var worker = new Worker(workerURL);

#if ENVIRONMENT_MAY_BE_NODE
Expand Down Expand Up @@ -166,7 +171,11 @@ worker.onmessage = (event) => {
if (!workerResponded) {
workerResponded = true;
Module.setStatus?.('');
#if SINGLE_FILE && SINGLE_FILE_BINARY_ENCODE
URL.revokeObjectURL(workerURL);
#else
if (SUPPORT_BASE64_EMBEDDING && workerURL !== filename) URL.revokeObjectURL(workerURL);
#endif
}

var data = event.data;
Expand Down
6 changes: 6 additions & 0 deletions src/settings.js
Original file line number Diff line number Diff line change
Expand Up @@ -1859,6 +1859,12 @@ var WASMFS = false;
// [link]
var SINGLE_FILE = false;

// If true, binary Wasm content is encoded using a custom UTF-8 embedding
// instead of base64. This generates smaller binary.
// Set this to false to revert back to earlier base64 encoding if you run into
// issues with the binary encoding. (and please let us know of any such issues)
var SINGLE_FILE_BINARY_ENCODE = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true by default why have a setting for this?

Is it because some folks can't handle UTF-8? If so, it might be worth mentioning there. e.g. "Disable this if you can't handle UTF-8 chars in the generated JS".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that it would allow developers to pass -sSINGLE_FILE_BINARY_ENCODE=0 to revert back to the previous base64 encoding, should issues arise.

Updated the text doc.


// If set to 1, all JS libraries will be automatically available at link time.
// This gets set to 0 in STRICT mode (or with MINIMAL_RUNTIME) which mean you
// need to explicitly specify -lfoo.js in at link time in order to access
Expand Down
4 changes: 2 additions & 2 deletions test/code_size/embind_hello_wasm.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"a.js": 9920,
"a.js.gz": 4354,
"a.wasm": 7715,
"a.wasm.gz": 3512,
"a.wasm.gz": 3508,
"total": 18187,
"total_gz": 8246
"total_gz": 8242
}
6 changes: 6 additions & 0 deletions test/code_size/hello_webgl2_wasm_singlefile_wasm.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"a.html": 17586,
"a.html.gz": 10152,
"total": 17586,
"total_gz": 10152
}
4 changes: 2 additions & 2 deletions test/code_size/math_wasm.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"a.js": 110,
"a.js.gz": 125,
"a.wasm": 2719,
"a.wasm.gz": 1674,
"a.wasm.gz": 1673,
"total": 3381,
"total_gz": 2179
"total_gz": 2178
}
8 changes: 4 additions & 4 deletions test/code_size/random_printf_wasm.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"a.html": 12690,
"a.html.gz": 6857,
"total": 12690,
"total_gz": 6857
"a.html": 11058,
"a.html.gz": 5724,
"total": 11058,
"total_gz": 5724
}
4 changes: 2 additions & 2 deletions test/code_size/random_printf_wasm2js.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"a.html": 17277,
"a.html.gz": 7489,
"a.html.gz": 7486,
"total": 17277,
"total_gz": 7489
"total_gz": 7486
}
2 changes: 1 addition & 1 deletion test/other/codesize/test_codesize_files_wasmfs.size
Original file line number Diff line number Diff line change
@@ -1 +1 @@
50948
50942
4 changes: 3 additions & 1 deletion test/test_browser.py
Original file line number Diff line number Diff line change
Expand Up @@ -3270,10 +3270,12 @@ def test_modularize(self, opts):
# this test is synchronous, so avoid async startup due to wasm features
self.compile_btest('browser_test_hello_world.c', ['-sMODULARIZE', '-sSINGLE_FILE'] + args + opts)
create_file('a.html', '''
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></head><body>
<script src="a.out.js"></script>
<script>
%s
</script>
</body></html>
''' % code)
self.run_browser('a.html', '/report_result?0')

Expand Down Expand Up @@ -4667,7 +4669,7 @@ def test_single_file_locate_file(self):
# Tests that SINGLE_FILE works as intended in a Worker in JS output
def test_single_file_worker_js(self):
self.compile_btest('browser_test_hello_world.c', ['-o', 'test.js', '--proxy-to-worker', '-sSINGLE_FILE'])
create_file('test.html', '<script src="test.js"></script>')
create_file('test.html', '<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></head><body><script src="test.js"></script></body></html>')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test break without this change?

Does the charset in the <head> somehow effect the script tag?

Does the mean that some/all users of SINGLE_FILE would ned to add an explicit charset to their html?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our existing shell files have already ~from the dawn of time had a <meta charset='utf-8'> directive. That is part of the general "best practices" boilerplate.

So we can expect quite safely that all the custom shell files that users have monkeyed off of any of the existing shell files we provide, will be UTF-8 (unless they have explicitly removed the meta charset directive, which is a bad idea in general)

About 98.2% of all websites utilize UTF-8, it is practically the only encoding used on the web. I don't really know of anyone who would use any other encoding on their web pages anymore.

It is just that our tests scripts have omitted the meta charset='utf-8' directive.

(If users did not want for some reason to specify that meta charset directive, they can also save the HTML file with a UTF-8 BOM marker, or they can send a Content-Type: application/html; charset=utf-8 HTTP header, but typically the meta charset directive is the simplest way)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, without this change, this test is broken?

I wonder if we should write a test to verify that -sSINGLE_FILE output is broken-by-default without UTF-8 encoding but that adding <meta charset='utf-8'> or disabling SINGLE_FILE_BINARY_ENCODE fixes it?

I guess we don't have any other non-ascii strings in our generated code otherwise <meta charset='utf-8'> would be required regardless of SINGLE_FILE and / or SINGLE_FILE_BINARY_ENCODE?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, without this change, this test is broken?

That's correct, the test would not work otherwise.

I wonder if we should write a test to verify that -sSINGLE_FILE output is broken-by-default without UTF-8 encoding but that adding <meta charset='utf-8'> or disabling SINGLE_FILE_BINARY_ENCODE fixes it?

I am not sure what would be the benefit of such test?

If we wanted to go the extra mile, I think we would want to either generate a UTF-8 BOM into the HTML file, or issue a link time warning if generated HTML file contains UTF-8 code points, but the HTML file does not contain a <meta charset='utf-8'> directive.

I guess we don't have any other non-ascii strings in our generated code otherwise <meta charset='utf-8'> would be required regardless of SINGLE_FILE and / or SINGLE_FILE_BINARY_ENCODE?

That is true. We (Emscripten) don't, but if users would have any UTF-8 chars they contain in JS libraries, then that would require them to specify the charset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue a link time warning if generated HTML file contains UTF-8 code points, but the HTML file does not contain a <meta charset='utf-8'> directive.

This would a useful warning for users who generate HTML output but my understanding is that the vast majority of our users generate JavaScript and then write their own HTML separetely. I'm curious, does unity use emscripten to generate HTML? Is so I guess that would be big use case for HTML output in production. Otherwise I don't know of any.

self.run_browser('test.html', '/report_result?0')
self.assertExists('test.js')
self.assertNotExists('test.worker.js')
Expand Down
12 changes: 9 additions & 3 deletions test/test_other.py
Original file line number Diff line number Diff line change
Expand Up @@ -8536,7 +8536,7 @@ def test_unoptimized_code_size(self):
# We don't care too about unoptimized code size but we would like to keep it
# under control to a certain extent. This test allows us to track major
# changes to the size of the unoptimized and unminified code size.
# Run with `--rebase` when this test fails.
# Run with `--rebaseline` when this test fails.
self.build(test_file('hello_world.c'), emcc_args=['-O0', '--output_eol=linux'])
self.check_expected_size_in_file('wasm',
test_file('other/test_unoptimized_code_size.wasm.size'),
Expand Down Expand Up @@ -9308,8 +9308,9 @@ def test_standalone_system_headers(self):

@is_slow_test
@parameterized({
'': (True,),
'disabled': (False,),
'': (1,),
'disabled': (0,),
'binary_encode': (2,),
})
@also_with_wasm2js
def test_single_file(self, single_file_enabled):
Expand All @@ -9327,6 +9328,8 @@ def test_single_file(self, single_file_enabled):
else:
expect_wasm = self.is_wasm()

cmd += [f'-sSINGLE_FILE_BINARY_ENCODE={int(single_file_enabled == 2)}']

if debug_enabled:
cmd += ['-g']
if closure_enabled:
Expand Down Expand Up @@ -10917,6 +10920,7 @@ def test_function_exports_are_small(self, args, opt, closure):
'random_printf_wasm2js': ('random_printf', True),
'hello_webgl_wasm': ('hello_webgl', False),
'hello_webgl_wasm2js': ('hello_webgl', True),
'hello_webgl2_wasm_singlefile': ('hello_webgl2_wasm_singlefile', False),
'hello_webgl2_wasm': ('hello_webgl2', False),
'hello_webgl2_wasm2js': ('hello_webgl2', True),
'math': ('math', False),
Expand Down Expand Up @@ -10963,6 +10967,7 @@ def test_minimal_runtime_code_size(self, test_name, js, compare_js_output=False)
'-lGL',
'-sMODULARIZE']
hello_webgl2_sources = hello_webgl_sources + ['-sMAX_WEBGL_VERSION=2']
hello_webgl2_wasm_singlefile_sources = hello_webgl2_sources + ['-sSINGLE_FILE']
hello_wasm_worker_sources = [test_file('wasm_worker/wasm_worker_code_size.c'), '-sWASM_WORKERS', '-sENVIRONMENT=web,worker']
embind_hello_sources = [test_file('code_size/embind_hello_world.cpp'), '-lembind']
embind_val_sources = [test_file('code_size/embind_val_hello_world.cpp'),
Expand All @@ -10977,6 +10982,7 @@ def test_minimal_runtime_code_size(self, test_name, js, compare_js_output=False)
'hello_webgl': hello_webgl_sources,
'math': math_sources,
'hello_webgl2': hello_webgl2_sources,
'hello_webgl2_wasm_singlefile': hello_webgl2_wasm_singlefile_sources,
'hello_wasm_worker': hello_wasm_worker_sources,
'embind_val': embind_val_sources,
'embind_hello': embind_hello_sources,
Expand Down
5 changes: 4 additions & 1 deletion tools/building.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,6 +599,8 @@ def closure_compiler(filename, advanced=True, extra_closure_args=None):
args += ['--language_out', 'NO_TRANSPILE']
# Tell closure never to inject the 'use strict' directive.
args += ['--emit_use_strict=false']
# Always output UTF-8 files, this helps generate UTF-8 code points instead of escaping code points with \uxxxx inside strings. https://github.com/google/closure-compiler/issues/4158
args += ['--charset=UTF8']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this land separately too if possible, along with a test that closure no longer produces \uxxx for UTF-8 strings in the input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still love to see this change (along with the test it effects) land separately, but I won't force you to do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait the change in file size you are referring to is due to SINGLE_FILE_BINARY_ENCODE, right?

I was wondering if this change would effect other tests.. In theory it should effect any JS code which includes a UTF-8 string, right? Regardless of SINGLE_FILE usage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this change would effect other tests.. In theory it should effect any JS code which includes a UTF-8 string, right? Regardless of SINGLE_FILE usage

Yeah, this would specify the encoding to be UTF-8 for all output when pushed through Closure.


if settings.IGNORE_CLOSURE_COMPILER_ERRORS:
args.append('--jscomp_off=*')
Expand Down Expand Up @@ -649,7 +651,8 @@ def move_to_safe_7bit_ascii_filename(filename):
# 7-bit ASCII range. Therefore make sure the command line we pass does not contain any such
# input files by passing all input filenames relative to the cwd. (user temp directory might
# be in user's home directory, and user's profile name might contain unicode characters)
proc = run_process(cmd, stderr=PIPE, check=False, env=env, cwd=tempfiles.tmpdir)
# https://github.com/google/closure-compiler/issues/4159: Closure outputs stdout/stderr in iso-8859-1 on Windows.
proc = run_process(cmd, stderr=PIPE, check=False, env=env, cwd=tempfiles.tmpdir, encoding='iso-8859-1' if WINDOWS else 'utf-8')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this and existing bug, or does it only show up with --charset=UTF8 is added above? (what is the default charset used by closure?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an Closure compiler bug. It occurs both with and without --charset=UTF8, so does not relate to that.

I don't know what is the default charset used by Closure.


# XXX Closure bug: if Closure is invoked with --create_source_map, Closure should create a
# outfile.map source map file (https://github.com/google/closure-compiler/wiki/Source-Maps)
Expand Down
44 changes: 41 additions & 3 deletions tools/link.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@ def base64_encode(b):
return b64.decode('ascii')


def base64_or_binary_encode(b):
return binary_encode(b) if settings.SINGLE_FILE and settings.SINGLE_FILE_BINARY_ENCODE else base64_encode(b)


def align_to_wasm_page_boundary(address):
page_size = webassembly.WASM_PAGE_SIZE
return ((address + (page_size - 1)) // page_size) * page_size
Expand Down Expand Up @@ -2339,7 +2343,7 @@ def phase_binaryen(target, options, wasm_target):
js = read_file(final_js)

if settings.MINIMAL_RUNTIME:
js = do_replace(js, '<<< WASM_BINARY_DATA >>>', base64_encode(read_binary(wasm_target)))
js = do_replace(js, '<<< WASM_BINARY_DATA >>>', base64_or_binary_encode(read_binary(wasm_target)))
else:
js = do_replace(js, '<<< WASM_BINARY_FILE >>>', get_subresource_location(wasm_target))
delete_file(wasm_target)
Expand Down Expand Up @@ -2981,11 +2985,45 @@ def move_file(src, dst):
shutil.move(src, dst)


def binary_encode(data):
"""This function encodes the given binary byte array to a UTF-8 string, by
first adding +1 to all the bytes [0, 255] to form values [1, 256], and then
encoding each of those values as UTF-8, except for specific byte values that
are escaped as two bytes. This kind of encoding results in a string that will
compress well by both gzip and brotli, unlike base64 encoding binary data
would do, and avoids emitting the null byte inside a string.
"""

out = bytearray(len(data) * 2) # Size output buffer conservatively
i = 0
for d in data:
d += 1 # Offset all bytes up by +1 to make zero (a very common value) be encoded with only one byte as 0x01. This is possible since we can encode 255 as 0x100 in UTF-8.
if d == ord("'"):
buf = [ord('\\'), d] # Escape single quote ' character with a backspace since we are writing a string inside single quotes. (' -> 2 bytes)
elif d == ord('"'):
buf = [ord('\\'), d] # Escape double quote " character with a backspace since optimizer may turn the string into being delimited with double quotes. (" -> 2 bytes)
elif d == ord('\r'):
buf = [ord('\\'), ord('r')] # Escape carriage return 0x0D as \r -> 2 bytes
elif d == ord('\n'):
buf = [ord('\\'), ord('n')] # Escape newline 0x0A as \n -> 2 bytes
elif d == ord('\\'):
buf = [ord('\\'), ord('\\')] # Escape backslash \ as \\ -> 2 bytes
else:
buf = chr(d).encode('utf-8') # Otherwise write the original value encoded in UTF-8 (1 or 2 bytes).
for b in buf: # Write the bytes to output buffer
out[i] = b
i += 1
return out[0:i].decode('utf-8') # Crop output buffer to the actual used size


# Returns the subresource location for run-time access
def get_subresource_location(path):
if settings.SINGLE_FILE:
data = base64.b64encode(utils.read_binary(path))
return 'data:application/octet-stream;base64,' + data.decode('ascii')
if settings.SINGLE_FILE_BINARY_ENCODE:
return binary_encode(utils.read_binary(path))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not need some kind of data:application/octet-stream; here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these binary embedded strings are not passed to the browser to decode as Data URIs, so specifying a MIME type would be meaningless.

else:
data = base64.b64encode(utils.read_binary(path))
return 'data:application/octet-stream;base64,' + data.decode('ascii')
else:
return os.path.basename(path)

Expand Down