Skip to content

MultimodalQnA image query, pdf, and dynamic ports#1134

Merged
chensuyue merged 25 commits intoopea-project:mainfrom
mhbuehler:mmqna-image-query
Jan 19, 2025
Merged

MultimodalQnA image query, pdf, and dynamic ports#1134
chensuyue merged 25 commits intoopea-project:mainfrom
mhbuehler:mmqna-image-query

Conversation

@mhbuehler
Copy link
Copy Markdown
Contributor

@mhbuehler mhbuehler commented Jan 10, 2025

Description

According to the RFC's Phase 2 plan, this PR adds image query support, PDF ingestion support, and dynamic ports to the microservices used by MultimodalQnA. This PR goes with this one in GenAIExamples.

Issues

RFC

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

pymupdf is new for the dataprep microservice, but it's not new to GenAIComps.

Tests

Tests were added to the following scripts:

  • tests/dataprep/test_dataprep_multimodal_redis_langchain.sh
  • tests/embeddings/test_embeddings_multimodal.sh
  • tests/lvms/test_lvms_llava.sh
  • tests/lvms/test_lvms_tgi-llava_on_intel_hpu.sh
  • tests/retrievers/test_retrievers_multimodal_redis_langchain.sh
  • tests/retrievers/test_retrievers_redis.sh

dmsuehir and others added 6 commits December 16, 2024 10:02
* Backend enhancements for image query capabilities for MultimodalQnA

* Fix model name var

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Remove space at end of prompt

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Add env var for the max number of images sent to the LVM

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* README update for the MAX_IMAGES env var

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Remove prints

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Audio query functionality to multimodal backend (#8)

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added in audio dict creation

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* separated audio from prompt

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added ASR endpoint

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* removed ASR endpoints from mm embedding

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* edited return logic, fixed function call

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added megaservice to elif

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* reworked helper func

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Append audio to prompt

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Reworked handle messages, added metadata

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Moved dictionary logic to right place

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* changed logic to rely on message len

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* list --> empty str

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed role bug where i never was > 0

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Fix after merge

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* removed whitespace

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix call to get role labels

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Gateway test updates images within the conversation

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Adds unit test coverage for audio query

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Update test to check the returned b64 types

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Update test since we don't expect images from the assistant

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Port number fix

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Formatting

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed place where port number is set

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Remove old comment and added more accurate description

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* add comment in code about MAX_IMAGES

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Add Gaudi support for image query

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Fix to pass the retrieved image last

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Revert out gateway and gateway test code, due to its move to GenAIExamples

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Fix retriever test for checking for b64_img_str in the result

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Co-authored-by: Omar Khleif <omar.khleif@intel.com>
Co-authored-by: Melanie Hart Buehler <melanie.h.buehler@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
…nv file (#17)

* changed all hardcoded ports to getenv with defaults instead

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
---------

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
* Initial implementation of PDF ingestion

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* PDF ingestion fixes

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Adds a test for dataprep microservice

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Improved comments, variable name, and a docstring

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

* Updated for review feedback

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
…ge-query

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
comps/cores/proto/docarray.py 99.43% <100.00%> (ø)

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
dmsuehir and others added 5 commits January 13, 2025 11:20
* Fixing Multimodal Retriever Redis tests

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Code cleanup

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Remove debug changes

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Formatting

Signed-off-by: dmsuehir <dina.s.jones@intel.com>

---------

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
okhleif-10 and others added 2 commits January 13, 2025 15:12
@ashahba ashahba removed the WIP label Jan 16, 2025
@chensuyue chensuyue merged commit ee0c11e into opea-project:main Jan 19, 2025
chensuyue pushed a commit to opea-project/GenAIExamples that referenced this pull request Jan 20, 2025
Per the proposed changes in this [RFC](https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md)'s Phase 2 plan, this PR adds support for image queries, PDF ingestion and display, and dynamic ports. There are also some bug fixes. This PR goes with [this one in GenAIComps](opea-project/GenAIComps#1134).

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
smguggen pushed a commit to opea-aws-proserve/GenAIComps that referenced this pull request Jan 23, 2025
According to the RFC's Phase 2 plan, this PR adds image query support, PDF ingestion support, and dynamic ports to the microservices used by MultimodalQnA. This PR goes with this one in GenAIExamples.

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
chyundunovDatamonsters pushed a commit to chyundunovDatamonsters/OPEA-GenAIExamples that referenced this pull request Mar 4, 2025
…roject#1381)

Per the proposed changes in this [RFC](https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md)'s Phase 2 plan, this PR adds support for image queries, PDF ingestion and display, and dynamic ports. There are also some bug fixes. This PR goes with [this one in GenAIComps](opea-project/GenAIComps#1134).

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
chyundunovDatamonsters pushed a commit to chyundunovDatamonsters/OPEA-GenAIExamples that referenced this pull request Mar 4, 2025
…roject#1381)

Per the proposed changes in this [RFC](https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md)'s Phase 2 plan, this PR adds support for image queries, PDF ingestion and display, and dynamic ports. There are also some bug fixes. This PR goes with [this one in GenAIComps](opea-project/GenAIComps#1134).

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>
madison-evans pushed a commit to SAPD-Intel/GenAIComps that referenced this pull request May 12, 2025
According to the RFC's Phase 2 plan, this PR adds image query support, PDF ingestion support, and dynamic ports to the microservices used by MultimodalQnA. This PR goes with this one in GenAIExamples.

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
cogniware-devops pushed a commit to Cogniware-Inc/GenAIExamples that referenced this pull request Dec 19, 2025
…roject#1381)

Per the proposed changes in this [RFC](https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md)'s Phase 2 plan, this PR adds support for image queries, PDF ingestion and display, and dynamic ports. There are also some bug fixes. This PR goes with [this one in GenAIComps](opea-project/GenAIComps#1134).

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
Signed-off-by: cogniware-devops <ambarish.desai@cogniware.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants