Skip to content

support more default node types#789

Merged
wzh1994 merged 11 commits intoLazyAGI:mainfrom
wzh1994:wzh/nodes
Oct 15, 2025
Merged

support more default node types#789
wzh1994 merged 11 commits intoLazyAGI:mainfrom
wzh1994:wzh/nodes

Conversation

@wzh1994
Copy link
Contributor

@wzh1994 wzh1994 commented Oct 10, 2025

📌 PR 内容 / PR Description

  • 支持更多的默认node类型
  • 支持给默认的reader注册transform规则,以处理一些特殊的情况,(如代码类文件过长,可以读取后自动分块,以便和其他的自动分页的reader在root层级对齐粒度)
  • SimpleDirectoryReader支持一次初始化后多次使用,即在前向的时候传入文件名称

✅ 变更类型 / Type of Change

  • 修复 Bug / Bug fix (non-breaking change that fixes an issue)
  • 新功能 / New feature (non-breaking change that adds functionality)
  • 重构 / Refactor (no functionality change, code structure optimized)
  • 重大变更 / Breaking change (fix or feature that would cause existing functionality to change)
  • 文档更新 / Documentation update (changes to docs only)
  • 性能优化 / Performance optimization

🧪 如何测试 / How Has This Been Tested?

  1. 单元测试

⚡ 更新后的用法示例 / Usage After Update

  1. 支持给默认的reader注册transform规则
    可以注册自定义的函数
def action(x):
    x += 'here in action'
    return DocNode(text=x)

lazyllm.tools.rag.add_post_action_for_default_reader('*.md', action)

或者直接注册内置的transform规则

lazyllm.tools.rag.add_post_action_for_default_reader('*.md', SentenceSplitter(128, 16))
  1. SimpleDirectoryReader支持在前向的时候传入文件名称

重构之前:

reader = SimpleDirectoryReader(input_files=input_files, file_extractor=file_readers, metadatas=metadatas)
reader()

重构之后,不仅支持重构前的行为,也额外支持

reader = SimpleDirectoryReader(file_extractor=file_readers)
reader(input_files=input_files, metadatas=metadatas)

@wzh1994 wzh1994 changed the title support more default node types [skip ci] support more default node types Oct 13, 2025
@mergify mergify bot added the lint_pass label Oct 13, 2025
@lwj-st lwj-st removed the lint_pass label Oct 14, 2025
@mergify mergify bot added the lint_pass label Oct 14, 2025
@lwj-st lwj-st removed the lint_pass label Oct 14, 2025
@mergify mergify bot added the lint_pass label Oct 14, 2025
@lwj-st lwj-st removed the lint_pass label Oct 14, 2025
@mergify mergify bot added the lint_pass label Oct 14, 2025
@lwj-st lwj-st removed the lint_pass label Oct 14, 2025
@mergify mergify bot added the lint_pass label Oct 14, 2025
@wzh1994 wzh1994 merged commit dc6461f into LazyAGI:main Oct 15, 2025
33 of 40 checks passed
@wzh1994 wzh1994 deleted the wzh/nodes branch October 15, 2025 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants