Skip to content

Add image query support to the backend microservices#12

Merged
dmsuehir merged 39 commits intommqna-image-queryfrom
dina/image_query
Dec 16, 2024
Merged

Add image query support to the backend microservices#12
dmsuehir merged 39 commits intommqna-image-queryfrom
dina/image_query

Conversation

@dmsuehir
Copy link
Copy Markdown
Collaborator

@dmsuehir dmsuehir commented Dec 2, 2024

Description

Updates the following microservices:

  • gateway: Updated to support multiple prompt formats depending on the model being used, updated send an image to the embedding service for first queries, and allow sending multiple images to the LVM.
  • lvm-llava: Updated to allow multiple images to be passed in and added support for other prompt formats depending on the LVM model. The lvm-llava-svc will also limit the number of images that get passed to the LVM using MAX_IMAGES
  • embedding: This microservice already allowed image input. Updated to pass that image through with the emedding output.
  • retriever: Updated to pass the original image through with the output (along with the image that was found with the similarity search)

Issues

https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md

Type of change

List the type of change like below. Please delete options that are not relevant.

  • New feature (non-breaking change which adds new functionality)

Dependencies

No new dependencies

Tests

Tests have been updated

okhleif-10 and others added 15 commits December 2, 2024 15:39
Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added in audio dict creation

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* separated audio from prompt

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added ASR endpoint

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* removed ASR endpoints from mm embedding

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* edited return logic, fixed function call

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added megaservice to elif

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* reworked helper func

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Append audio to prompt

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Reworked handle messages, added metadata

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Moved dictionary logic to right place

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* changed logic to rely on message len

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* list --> empty str

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
…ina/image_query

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Fixed role bug where enumeration was wrong
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
…nto dina/image_query

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
else:
print("request is from user.")
text = req_dict["prompt"]
text = f"<image>\nUSER: {text}\nASSISTANT:"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be that the LVM microservice would always add <image>\n and USER: to the beginning of the prompt that was created by the gateway, however I had to change that since images can now to scattered throughout the conversation (we can no longer just attach the single image to the first message in the conversation). I updated this mock of the LVM microservice to do something similar to what the actual LVM service does where it checks how many image tags already exist in the prompt, and adds extras if they are needed. The reason why we might need extras are when the retriever also gets an image from the vector store.

)
# print(result_dict)
self.assertEqual(result_dict[self.lvm.name]["text"], "<image>\nUSER: chao, \nASSISTANT:")
self.assertEqual(result_dict[self.lvm.name]["text"], "USER: <image>\nchao, \nASSISTANT:")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the prompts had <image>\nUSER: but the prompt format documented by huggingface is the other way around USER: <image>\n and also since we will now have images interleaved in the conversation, I changed it to match the HF docs.

formats than the default LLaVA 1.5 model.
"""

# Models to test and their expected prompts
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I would've parameterized this test with the pytest decorator, however it seems like that won't work in this case because of them subclassing unittest.IsolatedAsyncioTestCase. Apparently there's another library called parameterized that could do it, but the unittest container that this is run in doesn't have that dependency and it's easier to avoid adding a third party dependency (stackoverflow reference for the issue).

So, instead I just have lists of the models and their expected prompt format, and I'm looping those to do the checks.

mhbuehler and others added 4 commits December 6, 2024 09:12
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Adds unit test coverage for audio query
ashahba and others added 3 commits December 6, 2024 09:36
Copy link
Copy Markdown
Collaborator

@okhleif-10 okhleif-10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, do all tests pass?

# Multimodal RAG QnA With Videos has not yet accepts image as input during QnA.
num_messages = len(data["messages"]) if isinstance(data["messages"], list) else 1

# Multimodal RAG QnA With Videos has not yet accepts image as input during QnA.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment or update it for accuracy?

self.assertEqual(len(b64_types["image"]), 2)
finally:
test_gateway.stop()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test, it passed for me

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
@dmsuehir dmsuehir changed the base branch from mmqna-audio-query to mmqna-image-query December 12, 2024 18:32
@dmsuehir
Copy link
Copy Markdown
Collaborator Author

I rebased and synced this branch with mmqna-image-query since the audio input PR has been merged. I also moved the gateway updates to GenAIExamples, and reran the microservice tests for LVM, embddings multimodal, and retriever multimodal redis.


if echo "$CONTENT" | grep -q "retrieved_docs"; then
echo "[ retriever ] Content has retrieved_docs as expected."
if echo "$CONTENT" | grep -q "retrieved_docs"; then
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this grep supposed to look for img_b64_str?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I will fix this.

Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Copy link
Copy Markdown
Owner

@mhbuehler mhbuehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

logflag = os.getenv("LOGFLAG", False)

# The maximum number of images that should be sent to the LVM
max_images = int(os.getenv("MAX_IMAGES", 1))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line, is 1 being set manually? Or is it a default?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that if MAX_IMAGES is unset, it will default to 1

@dmsuehir dmsuehir merged commit 2eaf136 into mmqna-image-query Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants