fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241)#247
Closed
sjhddh wants to merge 1 commit into
Closed
fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241)#247sjhddh wants to merge 1 commit into
sjhddh wants to merge 1 commit into
Conversation
…dges Three call sites in the multimodal ingestion path call lightrag.operate.merge_nodes_and_edges without forwarding the entity_chunks_storage / relation_chunks_storage kwargs (and, in the modalprocessors case, the full_entities_storage / full_relations_storage kwargs as well). Those kwargs default to None, so the calls succeed silently — but entity-to-chunk and relation-to-chunk mappings are never persisted for multimodal content. The result: multimodal entities get created in the knowledge graph and vector DB, but their chunk association metadata is lost. Text-only ingestion (which LightRAG itself drives) populates these mappings correctly, so retrieval quality diverges between text and multimodal content. Forward the missing storages in all three sites so multimodal ingestion matches LightRAG's own usage pattern. Fixes HKUDS#241
Abdeltoto
approved these changes
Apr 22, 2026
Contributor
Abdeltoto
left a comment
There was a problem hiding this comment.
Hi @sjhddh 👋
I want to call this PR description out as exemplary — the table of fix sites, the line-level links to the LightRAG source, the reference to the related #228, and the explicit test plan made it very fast to verify. That's the kind of PR description maintainers love.
I cross-checked the diff against lightrag/operate.py and lightrag/lightrag.py:
- ✅
merge_nodes_and_edgesacceptsentity_chunks_storageandrelation_chunks_storageas optional kwargs that default toNone, so the additions are append-only and cannot break existing call sites. - ✅
LightRAGexposesfull_entities,full_relations,entity_chunks,relation_chunksasBaseKVStorageinstances (set up in__post_init__), soself.lightrag.<attr>is the correct accessor. - ✅ The three sites listed in the table match the call sites in current
main. - ✅ Same pattern as #228 — extending it consistently is the right move.
LGTM 👍
Disclosure: I opened #260 a few hours after this one with the same fix, before noticing yours. Since your PR landed first and is functionally identical, I'm closing #260 in favor of this one.
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a silent data-loss bug in multimodal ingestion: three
merge_nodes_and_edgescall sites inprocessor.pyandmodalprocessors.pywere not forwardingentity_chunks_storage/relation_chunks_storagekwargs (and in one case alsofull_entities_storage/full_relations_storage). Since those kwargs default toNone, the calls succeeded silently but entity-to-chunk and relation-to-chunk mappings were never persisted for multimodal content.Fixes #241.
What was wrong
lightrag.operate.merge_nodes_and_edgestakesentity_chunks_storageandrelation_chunks_storageas optional kwargs (LightRAGlightrag/operate.pyL571-572). When they are non-None, it upserts the full per-entity and per-relation chunk-id lists into those stores. When they are None, the call still succeeds but those mappings are never written.Three sites in RAG-Anything were calling
merge_nodes_and_edgeswithout forwarding these kwargs:raganything/processor.py:745_process_multimodal_content_individualentity_chunks_storage,relation_chunks_storageraganything/processor.py:1349_batch_merge_lightrag_style_type_awareentity_chunks_storage,relation_chunks_storageraganything/modalprocessors.py:800BaseModalProcessor._process_chunk_for_extractionfull_entities_storage,full_relations_storage,entity_chunks_storage,relation_chunks_storageLightRAG's own text-ingestion code path passes all four storages, so text-only ingestion populated
kv_store_entity_chunks.json/kv_store_relation_chunks.jsoncorrectly while multimodal ingestion silently did not — leading to divergent retrieval quality between text and multimodal content.Fix
Forward the missing kwargs in all three sites. The storages live on
self.lightrag(full_entities,full_relations,entity_chunks,relation_chunks), confirmed in LightRAGlightrag/lightrag.pyL682-700.Diff is 8 additions, 0 deletions across 2 files. This is the same pattern as #228 (which added
full_entities_storage/full_relations_storageforwarding to the first two sites); this PR completes the fix by adding the chunk-storage kwargs and extending the pattern tomodalprocessors.py.Test plan
merge_nodes_and_edgesaccepts these kwargs and defaults them toNone, so the added args are append-only and cannot break existing behavior.self.lightrag.entity_chunksandself.lightrag.relation_chunksexist asBaseKVStoragefields on theLightRAGclass.kv_store_entity_chunks.json/kv_store_relation_chunks.jsonnow include the multimodal entities' chunk mappings.