Skip to content

feat(KnowledgeBase):Add Word97-2003 (.doc) Binary File parsing module#2544

Merged
fangyinc merged 4 commits into
eosphoros-ai:mainfrom
ahaeureka:feature/rag-word97-2003-parser
Mar 29, 2025
Merged

feat(KnowledgeBase):Add Word97-2003 (.doc) Binary File parsing module#2544
fangyinc merged 4 commits into
eosphoros-ai:mainfrom
ahaeureka:feature/rag-word97-2003-parser

Conversation

@geebytes
Copy link
Copy Markdown
Contributor

@geebytes geebytes commented Mar 27, 2025

Description

  • feat(KnowledgeBase): Implement paragraph-based text extraction for Word 97-2003 .doc files using Microsoft's Word (.doc) Binary File Format specification and enable storage of parsed .doc content into knowledge base
  • chore(build):Fix VIRTUAL_ENV path in /opt/.uv.venv/activate to /opt/.uv.venv in base image Dockerfile
  • chore(dev):Fix user permission-related issues in devcontainer and optimize the Dockerfile.dev

How Has This Been Tested?

  • You can test using pytest with the command: pytest packages/dbgpt-ext/src/dbgpt_ext/rag/knowledge/tests/test_doc.py
  • You also can upload any .doc file to the knowledge base

Snapshots:

Include snapshots for easier review.

Checklist:

  • My code follows the style guidelines of this project
  • I have already rebased the commits and make the commit message conform to the project standard.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • Any dependent changes have been merged and published in downstream modules

@geebytes geebytes changed the title Feature(RAG):Add Word97-2003 (.doc) Binary File parsing module and fix devcontainer env issues Feature(KnowledgeBase):Add Word97-2003 (.doc) Binary File parsing module and fix devcontainer env issues Mar 27, 2025
@geebytes geebytes marked this pull request as ready for review March 27, 2025 19:05
@geebytes geebytes changed the title Feature(KnowledgeBase):Add Word97-2003 (.doc) Binary File parsing module and fix devcontainer env issues feat(KnowledgeBase):Add Word97-2003 (.doc) Binary File parsing module and fix devcontainer env issues Mar 28, 2025
@github-actions github-actions Bot added the enhancement New feature or request label Mar 28, 2025
@geebytes geebytes changed the title feat(KnowledgeBase):Add Word97-2003 (.doc) Binary File parsing module and fix devcontainer env issues feat(KnowledgeBase):Add Word97-2003 (.doc) Binary File parsing module Mar 28, 2025
Aries-ckt
Aries-ckt previously approved these changes Mar 28, 2025
Copy link
Copy Markdown
Collaborator

@Aries-ckt Aries-ckt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.Thanks for your contribution.

@geebytes geebytes force-pushed the feature/rag-word97-2003-parser branch from f2268f1 to d016d21 Compare March 28, 2025 15:56
Copy link
Copy Markdown
Collaborator

@Aries-ckt Aries-ckt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@fangyinc fangyinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fangyinc fangyinc merged commit c86243a into eosphoros-ai:main Mar 29, 2025
@geebytes geebytes deleted the feature/rag-word97-2003-parser branch March 29, 2025 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants