Skip to content

lucene 10 (needs to be between 7 and 9) org.apache.lucene.index.IndexFormatTooNewException when using self created corpus #32

@Reijarmo

Description

@Reijarmo

Hello at all.

I tried to use Bertserini for question answering with a self created corpus. The base example works perfect (with transformers == 3.4.0), but I am not able to find a solution for the lucene problem. I know Bertserini depends on lucene 8 while pyserini switched to lucene 9 in its latest version, so I installed https://pypi.org/project/pyserini/0.16.0/ on a separate conda environment, created a new index with it, but the problem stays the same.

When I tried to build an index with the pyserini version I got from installing bertserini I am stopped by “/home/user/anaconda3/envs/bertserini/bin/python: No module named pyserini.index.lucene“, Only solution i found for that upgrading pyserini which isn‘t an option because of the base bertserini problem.

Is there any easy way around? And sorry if this is a stupid question, but as a psychologist I have a rather weak informatic background knowledge.

edit1:
forgot to mention which command I used to create the index
python -m pyserini.index.lucene
--collection JsonCollection
--input tests/resources/sample_collection_jsonl
--index indexes/sample_collection_jsonl
--generator DefaultLuceneDocumentGenerator
--threads 1
--storePositions --storeDocvectors --storeRaw

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions