fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241) by sjhddh · Pull Request #247 · HKUDS/RAG-Anything

sjhddh · 2026-04-21T11:55:02Z

Summary

Fix a silent data-loss bug in multimodal ingestion: three merge_nodes_and_edges call sites in processor.py and modalprocessors.py were not forwarding entity_chunks_storage / relation_chunks_storage kwargs (and in one case also full_entities_storage / full_relations_storage). Since those kwargs default to None, the calls succeeded silently but entity-to-chunk and relation-to-chunk mappings were never persisted for multimodal content.

Fixes #241.

What was wrong

lightrag.operate.merge_nodes_and_edges takes entity_chunks_storage and relation_chunks_storage as optional kwargs (LightRAG lightrag/operate.py L571-572). When they are non-None, it upserts the full per-entity and per-relation chunk-id lists into those stores. When they are None, the call still succeeds but those mappings are never written.

Three sites in RAG-Anything were calling merge_nodes_and_edges without forwarding these kwargs:

File	Site	Missing kwargs
`raganything/processor.py:745`	`_process_multimodal_content_individual`	`entity_chunks_storage`, `relation_chunks_storage`
`raganything/processor.py:1349`	`_batch_merge_lightrag_style_type_aware`	`entity_chunks_storage`, `relation_chunks_storage`
`raganything/modalprocessors.py:800`	`BaseModalProcessor._process_chunk_for_extraction`	`full_entities_storage`, `full_relations_storage`, `entity_chunks_storage`, `relation_chunks_storage`

LightRAG's own text-ingestion code path passes all four storages, so text-only ingestion populated kv_store_entity_chunks.json / kv_store_relation_chunks.json correctly while multimodal ingestion silently did not — leading to divergent retrieval quality between text and multimodal content.

Fix

Forward the missing kwargs in all three sites. The storages live on self.lightrag (full_entities, full_relations, entity_chunks, relation_chunks), confirmed in LightRAG lightrag/lightrag.py L682-700.

Diff is 8 additions, 0 deletions across 2 files. This is the same pattern as #228 (which added full_entities_storage / full_relations_storage forwarding to the first two sites); this PR completes the fix by adding the chunk-storage kwargs and extending the pattern to modalprocessors.py.

Test plan

Static review: verified merge_nodes_and_edges accepts these kwargs and defaults them to None, so the added args are append-only and cannot break existing behavior.
Verified self.lightrag.entity_chunks and self.lightrag.relation_chunks exist as BaseKVStorage fields on the LightRAG class.
For maintainers: end-to-end check by ingesting a PDF with images/tables and confirming kv_store_entity_chunks.json / kv_store_relation_chunks.json now include the multimodal entities' chunk mappings.

…dges Three call sites in the multimodal ingestion path call lightrag.operate.merge_nodes_and_edges without forwarding the entity_chunks_storage / relation_chunks_storage kwargs (and, in the modalprocessors case, the full_entities_storage / full_relations_storage kwargs as well). Those kwargs default to None, so the calls succeed silently — but entity-to-chunk and relation-to-chunk mappings are never persisted for multimodal content. The result: multimodal entities get created in the knowledge graph and vector DB, but their chunk association metadata is lost. Text-only ingestion (which LightRAG itself drives) populates these mappings correctly, so retrieval quality diverges between text and multimodal content. Forward the missing storages in all three sites so multimodal ingestion matches LightRAG's own usage pattern. Fixes HKUDS#241

Abdeltoto

Hi @sjhddh 👋

I want to call this PR description out as exemplary — the table of fix sites, the line-level links to the LightRAG source, the reference to the related #228, and the explicit test plan made it very fast to verify. That's the kind of PR description maintainers love.

I cross-checked the diff against lightrag/operate.py and lightrag/lightrag.py:

✅ merge_nodes_and_edges accepts entity_chunks_storage and relation_chunks_storage as optional kwargs that default to None, so the additions are append-only and cannot break existing call sites.
✅ LightRAG exposes full_entities, full_relations, entity_chunks, relation_chunks as BaseKVStorage instances (set up in __post_init__), so self.lightrag.<attr> is the correct accessor.
✅ The three sites listed in the table match the call sites in current main.
✅ Same pattern as #228 — extending it consistently is the right move.

LGTM 👍

Disclosure: I opened #260 a few hours after this one with the same fix, before noticing yours. Since your PR landed first and is functionally identical, I'm closing #260 in favor of this one.

LarFii · 2026-04-25T09:10:47Z

Closing this in favor of #250, which covers the same #241 fix and also forwards doc_id for the modal processor merge path. Thanks for the contribution and the detailed write-up!

Abdeltoto approved these changes Apr 22, 2026

View reviewed changes

This was referenced Apr 22, 2026

fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls #250

Merged

fix: pass entity_chunks_storage and relation_chunks_storage to merge_nodes_and_edges (closes #241) #260

Closed

LarFii closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241)#247

fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241)#247
sjhddh wants to merge 1 commit into
HKUDS:mainfrom
sjhddh:fix/multimodal-pass-chunk-storages

sjhddh commented Apr 21, 2026

Uh oh!

Abdeltoto left a comment

Uh oh!

LarFii commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sjhddh commented Apr 21, 2026

Summary

What was wrong

Fix

Test plan

Uh oh!

Abdeltoto left a comment

Choose a reason for hiding this comment

Uh oh!

LarFii commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants