-
Notifications
You must be signed in to change notification settings - Fork 180
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The blingfire sentence tokenizer is only avialable in python right now, there is a "quite easy" option to bring this to typescript via WASM.
to have this written down somewhere, here are the steps I followed to get this running.
- clone blingfire repo
- follow https://github.com/microsoft/BlingFire/blob/master/wasm/readme.md
- change Makefile do run:
em++ ../blingfiretools/blingfiretokdll/blingfiretokdll.cpp ../blingfiretools/blingfiretokdll/*.cxx ../blingfireclient.library/src/*.cpp -s WASM=1 -s EXPORTED_FUNCTIONS="[_GetBlingFireTokVersion, _TextToSentences, _TextToWords, _TextToIds, _SetModel, _FreeModel, _WordHyphenationWithModel, _malloc, _free]" -s "EXPORTED_RUNTIME_METHODS=['lengthBytesUTF8', 'stackAlloc', 'stringToUTF8', 'UTF8ToString', 'cwrap']" -s ALLOW_MEMORY_GROWTH=1 -s DISABLE_EXCEPTION_CATCHING=0 -I ../blingfireclient.library/inc/ -I ../blingfirecompile.library/inc/ -DHAVE_ICONV_LIB -DHAVE_NO_SPECSTRINGS -D_VERBOSE -DBLING_FIRE_NOAP -DBLING_FIRE_NOWINDOWS -DNDEBUG -O3 -s MODULARIZE=1 -s EXPORT_ES6 --std=c++11 -o blingfire.js
(adds-s MODULARIZE=1, -s EXPORT_ES6and fixes malloc/free exports. - copy blingfire.js + blingfire.wasm to livekit :)
- get blingfire_wrapper and adapt how they load the module:
import createModule from './blingfire.js';
const Module = await createModule()- use the module wrapper:
import { TextToSentences } from './blingfire_wrapper.js';
console.log('TextToSentences', TextToSentences('This is a sentence. And another one.'));Relevant log output
No response
Describe your environment
linux
Minimal reproducible example
No response
Additional information
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working