Skip to content

fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241)#247

Closed
sjhddh wants to merge 1 commit into
HKUDS:mainfrom
sjhddh:fix/multimodal-pass-chunk-storages
Closed

fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241)#247
sjhddh wants to merge 1 commit into
HKUDS:mainfrom
sjhddh:fix/multimodal-pass-chunk-storages

Conversation

@sjhddh
Copy link
Copy Markdown

@sjhddh sjhddh commented Apr 21, 2026

Summary

Fix a silent data-loss bug in multimodal ingestion: three merge_nodes_and_edges call sites in processor.py and modalprocessors.py were not forwarding entity_chunks_storage / relation_chunks_storage kwargs (and in one case also full_entities_storage / full_relations_storage). Since those kwargs default to None, the calls succeeded silently but entity-to-chunk and relation-to-chunk mappings were never persisted for multimodal content.

Fixes #241.

What was wrong

lightrag.operate.merge_nodes_and_edges takes entity_chunks_storage and relation_chunks_storage as optional kwargs (LightRAG lightrag/operate.py L571-572). When they are non-None, it upserts the full per-entity and per-relation chunk-id lists into those stores. When they are None, the call still succeeds but those mappings are never written.

Three sites in RAG-Anything were calling merge_nodes_and_edges without forwarding these kwargs:

File Site Missing kwargs
raganything/processor.py:745 _process_multimodal_content_individual entity_chunks_storage, relation_chunks_storage
raganything/processor.py:1349 _batch_merge_lightrag_style_type_aware entity_chunks_storage, relation_chunks_storage
raganything/modalprocessors.py:800 BaseModalProcessor._process_chunk_for_extraction full_entities_storage, full_relations_storage, entity_chunks_storage, relation_chunks_storage

LightRAG's own text-ingestion code path passes all four storages, so text-only ingestion populated kv_store_entity_chunks.json / kv_store_relation_chunks.json correctly while multimodal ingestion silently did not — leading to divergent retrieval quality between text and multimodal content.

Fix

Forward the missing kwargs in all three sites. The storages live on self.lightrag (full_entities, full_relations, entity_chunks, relation_chunks), confirmed in LightRAG lightrag/lightrag.py L682-700.

Diff is 8 additions, 0 deletions across 2 files. This is the same pattern as #228 (which added full_entities_storage / full_relations_storage forwarding to the first two sites); this PR completes the fix by adding the chunk-storage kwargs and extending the pattern to modalprocessors.py.

Test plan

  • Static review: verified merge_nodes_and_edges accepts these kwargs and defaults them to None, so the added args are append-only and cannot break existing behavior.
  • Verified self.lightrag.entity_chunks and self.lightrag.relation_chunks exist as BaseKVStorage fields on the LightRAG class.
  • For maintainers: end-to-end check by ingesting a PDF with images/tables and confirming kv_store_entity_chunks.json / kv_store_relation_chunks.json now include the multimodal entities' chunk mappings.

…dges

Three call sites in the multimodal ingestion path call
lightrag.operate.merge_nodes_and_edges without forwarding the
entity_chunks_storage / relation_chunks_storage kwargs (and, in the
modalprocessors case, the full_entities_storage / full_relations_storage
kwargs as well). Those kwargs default to None, so the calls succeed
silently — but entity-to-chunk and relation-to-chunk mappings are never
persisted for multimodal content.

The result: multimodal entities get created in the knowledge graph and
vector DB, but their chunk association metadata is lost. Text-only
ingestion (which LightRAG itself drives) populates these mappings
correctly, so retrieval quality diverges between text and multimodal
content.

Forward the missing storages in all three sites so multimodal ingestion
matches LightRAG's own usage pattern.

Fixes HKUDS#241
Copy link
Copy Markdown
Contributor

@Abdeltoto Abdeltoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sjhddh 👋

I want to call this PR description out as exemplary — the table of fix sites, the line-level links to the LightRAG source, the reference to the related #228, and the explicit test plan made it very fast to verify. That's the kind of PR description maintainers love.

I cross-checked the diff against lightrag/operate.py and lightrag/lightrag.py:

  • merge_nodes_and_edges accepts entity_chunks_storage and relation_chunks_storage as optional kwargs that default to None, so the additions are append-only and cannot break existing call sites.
  • LightRAG exposes full_entities, full_relations, entity_chunks, relation_chunks as BaseKVStorage instances (set up in __post_init__), so self.lightrag.<attr> is the correct accessor.
  • ✅ The three sites listed in the table match the call sites in current main.
  • ✅ Same pattern as #228 — extending it consistently is the right move.

LGTM 👍

Disclosure: I opened #260 a few hours after this one with the same fix, before noticing yours. Since your PR landed first and is functionally identical, I'm closing #260 in favor of this one.

@LarFii
Copy link
Copy Markdown
Collaborator

LarFii commented Apr 25, 2026

Closing this in favor of #250, which covers the same #241 fix and also forwards doc_id for the modal processor merge path. Thanks for the contribution and the detailed write-up!

@LarFii LarFii closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:merge_nodes_and_edges calls missing entity_chunks_storage and relation_chunks_storage during multimodal processing

3 participants