Skip to content

Conversation

@peaBerberian
Copy link
Collaborator

@peaBerberian peaBerberian commented Feb 13, 2024

CURRENT STATUS: I kept both the DOMParser-based xml parser (now in src/parsers/manifest/dash/native-parser ) and the new full-js MPD parser (in src/parsers/manifest/dash/fast-js-parser), duplicating a lot of code (which is the reason behind that huge diff, you might want to look at the diff BEFORE the native parser was re-added - there's a commit for that - to get a better idea of real diffs).

The idea would be to release the "fast js" parser as a default in MULTI_THREAD cases (even removing the necessity of having to import the WebAssembly-based parser) but still rely on the ""native"" one on main thread, as it is much more battle-tested. If the fast js one shows no issue and is indeed always faster, we will remove the native one (and even rely on the fast on all XML parsing cases).


Motivation

Currently, we rely on our WebAssembly MPD parser for two different scenarios:

  1. performance reasons (on HUGE MPD of tens of MB, with often a lot of SegmentTimeline data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, DOMParser, is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was written because of the first scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use DOMParser there.

Having to provide the WebAssembly code to the MULTI_THREAD feature is also a little awkward. We recommend to applications that they use our "embedded" version to make it simpler, yet it weighs 400+KB. Even if it compresses very very well, it is still a huge file.

It also turns out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the v4.0.0-rc.1 made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser inside the WebWorker, like we already have on main thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which I guess almost everyone could prefer for their applications.

Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature.

Previous work

In a previous work (I never made the Pull Request for it), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase.

I also made quick tests with dependencies like fast-xml-parser but performances appeared poor so I did not continue in this path.

This solution

To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their ALL of their XML parsing need.
It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser).

If this goes very well, we could even imagine doing like the Shaka-player and completely remove reliance on the DOMParser native API - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though.

The code was a little hard to integrate through npm in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into src/utils/xml-parser.ts.

I also had to update its code, this means that code updates on their side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is not that much of an issue.

@peaBerberian peaBerberian added work-in-progress This Pull Request or issue is not finished yet poc Proof Of Concept - This is just an attempt to ponder on labels Feb 13, 2024
@peaBerberian peaBerberian changed the title [POC]: Experiment with "txml" parser to get rid of the DOMParser [POC] DASH: Experiment with "txml" parser to get rid of the DOMParser Feb 13, 2024
@peaBerberian
Copy link
Collaborator Author

To note that it will also make it possible to play smooth contents in multithread mode.

@peaBerberian peaBerberian added the skip-performance-checks Performance tests are not run on this PR label Feb 13, 2024
@peaBerberian peaBerberian force-pushed the feat/txml branch 9 times, most recently from ca0fe2c to 0210ec4 Compare February 16, 2024 11:51
@peaBerberian peaBerberian removed the skip-performance-checks Performance tests are not run on this PR label Feb 16, 2024
@peaBerberian peaBerberian changed the title [POC] DASH: Experiment with "txml" parser to get rid of the DOMParser DASH: rely on "txml" parser for Multithreading usages Feb 19, 2024
Comment on lines 75 to 79
const dashWasmParser = new DashWasmParser();
features.dashParsers.wasm = dashWasmParser;
features.dashParsers.fastJs = DashFastJsParser;
features.transports.dash = createDashPipelines;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to prefer assigning those property over calling the exposed methods in src/features/list like addDashFeature, dashWasmFeature .. ?

Copy link
Collaborator Author

@peaBerberian peaBerberian Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question and I've thought about it.

However it seems that features exported in src/features/list, because they are the exact same files imported by applications, assume a main thread environment. This is to reduce the number of features to import for an application (the end goal being a simpler API).

As such, importing src/features/lib/dash.ts also adds a MediaSourceContentInitializer to play content in main thread and a MainCodecSupportProber to perform codec checks on main thread.

Because we don't want those in a worker, in the end I kept just manually selecting the exact feature we want in worker_main.ts

Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
@peaBerberian peaBerberian added this to the 4.1.0 milestone Feb 23, 2024
@peaBerberian peaBerberian merged commit 753400e into dev Feb 23, 2024
@peaBerberian peaBerberian mentioned this pull request Jun 13, 2024
@peaBerberian peaBerberian deleted the feat/txml branch July 26, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

poc Proof Of Concept - This is just an attempt to ponder on work-in-progress This Pull Request or issue is not finished yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants