Skip to content

Memory usage high with lemmatizer #14

@EmilStenstrom

Description

@EmilStenstrom

I'm running the lemmatizer that's part of swe-pipeline on a very limited online host. It only gives me 500 Mb of RAM that I have to try to cram my NLP stuff into.

Here's a small test script that just loads the lemmatizer into memory and uses psutil to measure the memory used:

def memory_usage_psutil():
    # return the memory usage in MB
    import os
    import psutil
    process = psutil.Process(os.getpid())
    mem = process.memory_info()[0] / float(2 ** 20)
    return mem

if __name__ == '__main__':
    print("Base memory usage: %.2f MB" % memory_usage_psutil())

    import lemmatize
    lemmatizer = lemmatize.SUCLemmatizer()
    lemmatizer.load('swe-pipeline/suc-saldo.lemmas')
    print("Lemmatize memory usage: %.2f MB" % memory_usage_psutil())

To run it, put it in the efselab root directory, install psutil with pip install psutil, and run it.

(efselab) ~/Projects/efselab $ python test.py 
Base memory usage: 9.10 MB
Lemmatize memory usage: 492.65 MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions