Skip to content

Node.js: Loading corrupted language trained data does not throw an error #602

@Razzmatazzz

Description

@Razzmatazzz

If the traineddata cache becomes corrupted, tesseract.js will still load it without throwing an error. Then, when the recognize function is called, it results in an uncatchable fatal error.

Steps to reproduce the behavior:

  1. Get a copy of eng.traineddata.gz in the local project folder
  2. Create a blank file named eng.traineddata in the project folder to simulate a corrupted cache
  3. Run the following:
const { createWorker, OEM } = require('tesseract.js');
const Jimp = require('jimp');

(async () => {
    const worker = createWorker({
        langPath: __dirname,
        logger: message => {
            //console.log(message);
        },
        /*errorHandler: error => {
            console.log('error from worker:', error);
        }*/
    });
    try {
        const img = await Jimp.read('https://tesseract.projectnaptha.com/img/eng_bw.png');
        await worker.load();
        await worker.loadLanguage('eng');
        await worker.initialize('eng', OEM.LSTM_ONLY);
        console.log('Recognizing text...');
        const {data: { text } } = await worker.recognize(await img.getBufferAsync(Jimp.AUTO));
        console.log(text);
    } catch (error){
        console.log('caught error:', error);
    }
    process.exit();
})();

This results in the following output:

> [email protected] start
> node index.js

Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Recognizing text...
AdaptedTemplates != nullptr:Error:Assert failed:in file /workspace/tesseract/src/classify/adaptmatch.cpp, line 196
undefined
undefined
C:\Users\Razz\Documents\Visual Studio Code Projects\Razzmatazzz\tesstest\node_modules\tesseract.js\src\createWorker.js:173
        throw Error(data);
        ^

Error: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
    at ChildProcess.<anonymous> (C:\Users\Razz\Documents\Visual Studio Code Projects\Razzmatazzz\tesstest\node_modules\tesseract.js\src\createWorker.js:173:15)
    at ChildProcess.emit (node:events:390:28)
    at emit (node:internal/child_process:917:12)
    at processTicksAndRejections (node:internal/process/task_queues:84:21)

Note the absence of "caught error", indicating that the error is not being caught. The "Error opening data file" output occurs on the worker.initialize() call, but it does not result in an exception being thrown at that point.

If, however, the errorHandler function is enabled, this is what happens:

> [email protected] start
> node index.js

Error opening data file ./eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Recognizing text...
AdaptedTemplates != nullptr:Error:Assert failed:in file /workspace/tesseract/src/classify/adaptmatch.cpp, line 196
undefined
undefined
error from worker: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
caught error: RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.

The worker's errorHandler function doesn't receive an error when the initialize function is called, but it does when recognize is called. Also, interestingly, the error triggered by calling the recognize function now becomes catchable.

I would expect the worker.recognize function to throw a catchable error, regardless of whether the user has specified an errorHandler for the worker. I would also expect the worker.initialize function to either throw an error when it can't load the specified traineddata or at least send an error to the errorHandler. Neither is currently done.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions