Skip to content

fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls#250

Merged
LarFii merged 1 commit into
HKUDS:mainfrom
peterCheng123321:fix/241-merge-nodes-missing-storage-params
Apr 25, 2026
Merged

fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls#250
LarFii merged 1 commit into
HKUDS:mainfrom
peterCheng123321:fix/241-merge-nodes-missing-storage-params

Conversation

@peterCheng123321
Copy link
Copy Markdown

@peterCheng123321 peterCheng123321 commented Apr 21, 2026

Summary

Fixes #241.

Three merge_nodes_and_edges call sites were omitting storage parameters that are required to persist entity-to-chunk and relation-to-chunk mappings for multimodal content:

File Method Parameters added
processor.py _process_multimodal_content_individual (~line 756) entity_chunks_storage, relation_chunks_storage
processor.py _batch_merge_lightrag_style_type_aware (~line 1362) entity_chunks_storage, relation_chunks_storage
modalprocessors.py BaseModalProcessor._process_chunk_for_extraction (~line 800) full_entities_storage, full_relations_storage, doc_id, entity_chunks_storage, relation_chunks_storage

Because these parameters default to None, the calls succeeded silently — but kv_store_entity_chunks.json and kv_store_relation_chunks.json were never populated for multimodal content, degrading retrieval quality compared to text-only ingestion.

For modalprocessors.py, doc_id is derived from chunk_data["full_doc_id"] which is already stored when the chunk is created.

Test plan

  • Ingest a PDF with images/tables via process_document_complete()
  • Confirm kv_store_entity_chunks.json and kv_store_relation_chunks.json contain entries for multimodal entities
  • Compare retrieval quality for multimodal content before and after

…rge_nodes_and_edges calls

Three call sites in processor.py and modalprocessors.py were omitting
entity_chunks_storage and relation_chunks_storage (and in modalprocessors.py
also full_entities_storage, full_relations_storage, and doc_id). Because
these default to None, calls succeeded silently but entity-to-chunk and
relation-to-chunk mappings were never persisted for multimodal content,
degrading retrieval quality.

Fixes HKUDS#241

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Copy link
Copy Markdown
Contributor

@Abdeltoto Abdeltoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @peterCheng123321 👋

Nice catch on this fix. I went through the diff carefully against lightrag/operate.py and noticed that this PR is actually slightly more complete than the parallel #247 and my own #260 — both of those forward the four *_storage kwargs to BaseModalProcessor._process_chunk_for_extraction, but only this PR also forwards doc_id (extracted from chunk_data["full_doc_id"]). doc_id is used inside merge_nodes_and_edges for traceability of which document each entity/relation came from, so omitting it leaves a real gap. Thanks for spotting that.

LGTM 👍

One small cohesion nit (non-blocking): in modalprocessors.py, entity_chunks_storage and relation_chunks_storage are placed after llm_response_cache, while full_entities_storage / full_relations_storage are grouped earlier with global_config. #247 groups all four *_storage kwargs together — the call site reads a bit more uniformly that way. Purely cosmetic, feel free to ignore.

Disclosure: I opened #260 a few hours after this one with a partial version of the same fix (without doc_id). Closing #260 in favor of this PR.

@peterCheng123321
Copy link
Copy Markdown
Author

Thanks for the detailed review @Abdeltoto and for approving @ashah1992 (the original reporter — glad it addresses the issue fully)!

Good point on the cosmetic ordering nit — happy to reorder the kwargs to group all four *_storage params together if the maintainers prefer consistency with #247. Just let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:merge_nodes_and_edges calls missing entity_chunks_storage and relation_chunks_storage during multimodal processing

5 participants