Skip to content

[Bug] Invalid vertex ID error when processing many files in fast-graphrag #105

@shota-kizawa

Description

@shota-kizawa

Describe the bug
When the number of input files increases, fast-graphrag triggers an error related to invalid vertex IDs during graph construction. This eventually leads to a crash with the following message:

Error at src/graph/type_indexededgelist.c:1436 : Cannot get edge ID. - Invalid vertex ID.

To Reproduce
Steps to reproduce the behavior:
1. Prepare a project using fast-graphrag.
2. Increase the number of input files beyond a certain threshold.
(e.g., scaling up dataset size or document batch input)
3. Run the pipeline that builds the graph.
4. Observe the error.

Expected behavior
The library should handle larger numbers of files gracefully without producing invalid vertex ID errors or semaphore leaks. Graph construction should succeed regardless of dataset size (within memory constraints).

Additional context
• Appears to be related to edge/vertex indexing within the C extension (type_indexededgelist.c).
• Error may occur only when scaling beyond a certain dataset size.
• Also triggers multiprocessing cleanup warnings, suggesting potential issues with resource management.
• Environment: macOS (Apple Silicon), Python 3.11.10, fast-graphrag (latest release at time of testing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions