-
Notifications
You must be signed in to change notification settings - Fork 233
Description
Describe the bug
When the number of input files increases, fast-graphrag triggers an error related to invalid vertex IDs during graph construction. This eventually leads to a crash with the following message:
Error at src/graph/type_indexededgelist.c:1436 : Cannot get edge ID. - Invalid vertex ID.
To Reproduce
Steps to reproduce the behavior:
1. Prepare a project using fast-graphrag.
2. Increase the number of input files beyond a certain threshold.
(e.g., scaling up dataset size or document batch input)
3. Run the pipeline that builds the graph.
4. Observe the error.
Expected behavior
The library should handle larger numbers of files gracefully without producing invalid vertex ID errors or semaphore leaks. Graph construction should succeed regardless of dataset size (within memory constraints).
Additional context
• Appears to be related to edge/vertex indexing within the C extension (type_indexededgelist.c).
• Error may occur only when scaling beyond a certain dataset size.
• Also triggers multiprocessing cleanup warnings, suggesting potential issues with resource management.
• Environment: macOS (Apple Silicon), Python 3.11.10, fast-graphrag (latest release at time of testing).