Memory usage high with lemmatizer

I'm running the lemmatizer that's part of swe-pipeline on a very limited online host. It only gives me 500 Mb of RAM that I have to try to cram my NLP stuff into. 

Here's a small test script that just loads the lemmatizer into memory and uses `psutil` to measure the memory used:

``` python
def memory_usage_psutil():
    # return the memory usage in MB
    import os
    import psutil
    process = psutil.Process(os.getpid())
    mem = process.memory_info()[0] / float(2 ** 20)
    return mem

if __name__ == '__main__':
    print("Base memory usage: %.2f MB" % memory_usage_psutil())

    import lemmatize
    lemmatizer = lemmatize.SUCLemmatizer()
    lemmatizer.load('swe-pipeline/suc-saldo.lemmas')
    print("Lemmatize memory usage: %.2f MB" % memory_usage_psutil())
```

To run it, put it in the efselab root directory, install psutil with `pip install psutil`, and run it.

```
(efselab) ~/Projects/efselab $ python test.py 
Base memory usage: 9.10 MB
Lemmatize memory usage: 492.65 MB
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage high with lemmatizer #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory usage high with lemmatizer #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions