fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls by peterCheng123321 · Pull Request #250 · HKUDS/RAG-Anything

peterCheng123321 · 2026-04-21T15:36:01Z

Summary

Fixes #241.

Three merge_nodes_and_edges call sites were omitting storage parameters that are required to persist entity-to-chunk and relation-to-chunk mappings for multimodal content:

File	Method	Parameters added
`processor.py`	`_process_multimodal_content_individual` (~line 756)	`entity_chunks_storage`, `relation_chunks_storage`
`processor.py`	`_batch_merge_lightrag_style_type_aware` (~line 1362)	`entity_chunks_storage`, `relation_chunks_storage`
`modalprocessors.py`	`BaseModalProcessor._process_chunk_for_extraction` (~line 800)	`full_entities_storage`, `full_relations_storage`, `doc_id`, `entity_chunks_storage`, `relation_chunks_storage`

Because these parameters default to None, the calls succeeded silently — but kv_store_entity_chunks.json and kv_store_relation_chunks.json were never populated for multimodal content, degrading retrieval quality compared to text-only ingestion.

For modalprocessors.py, doc_id is derived from chunk_data["full_doc_id"] which is already stored when the chunk is created.

Test plan

Ingest a PDF with images/tables via process_document_complete()
Confirm kv_store_entity_chunks.json and kv_store_relation_chunks.json contain entries for multimodal entities
Compare retrieval quality for multimodal content before and after

…rge_nodes_and_edges calls Three call sites in processor.py and modalprocessors.py were omitting entity_chunks_storage and relation_chunks_storage (and in modalprocessors.py also full_entities_storage, full_relations_storage, and doc_id). Because these default to None, calls succeeded silently but entity-to-chunk and relation-to-chunk mappings were never persisted for multimodal content, degrading retrieval quality. Fixes HKUDS#241 Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Abdeltoto

Hi @peterCheng123321 👋

Nice catch on this fix. I went through the diff carefully against lightrag/operate.py and noticed that this PR is actually slightly more complete than the parallel #247 and my own #260 — both of those forward the four *_storage kwargs to BaseModalProcessor._process_chunk_for_extraction, but only this PR also forwards doc_id (extracted from chunk_data["full_doc_id"]). doc_id is used inside merge_nodes_and_edges for traceability of which document each entity/relation came from, so omitting it leaves a real gap. Thanks for spotting that.

LGTM 👍

One small cohesion nit (non-blocking): in modalprocessors.py, entity_chunks_storage and relation_chunks_storage are placed after llm_response_cache, while full_entities_storage / full_relations_storage are grouped earlier with global_config. #247 groups all four *_storage kwargs together — the call site reads a bit more uniformly that way. Purely cosmetic, feel free to ignore.

Disclosure: I opened #260 a few hours after this one with a partial version of the same fix (without doc_id). Closing #260 in favor of this PR.

peterCheng123321 · 2026-04-22T11:58:07Z

Thanks for the detailed review @Abdeltoto and for approving @ashah1992 (the original reporter — glad it addresses the issue fully)!

Good point on the cosmetic ordering nit — happy to reorder the kwargs to group all four *_storage params together if the maintainers prefer consistency with #247. Just let me know.

Abdeltoto approved these changes Apr 22, 2026

View reviewed changes

Abdeltoto mentioned this pull request Apr 22, 2026

fix: pass entity_chunks_storage and relation_chunks_storage to merge_nodes_and_edges (closes #241) #260

Closed

2 tasks

ashah1992 approved these changes Apr 22, 2026

View reviewed changes

LarFii merged commit b63adb2 into HKUDS:main Apr 25, 2026

LarFii mentioned this pull request Apr 25, 2026

fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241) #247

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls#250

fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls#250
LarFii merged 1 commit into
HKUDS:mainfrom
peterCheng123321:fix/241-merge-nodes-missing-storage-params

peterCheng123321 commented Apr 21, 2026 •

edited

Loading

Uh oh!

Abdeltoto left a comment

Uh oh!

peterCheng123321 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

peterCheng123321 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Abdeltoto left a comment

Choose a reason for hiding this comment

Uh oh!

peterCheng123321 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

peterCheng123321 commented Apr 21, 2026 •

edited

Loading