Skip to content

Adds audio querying to MultimodalQ&A gateway#974

Closed
mhbuehler wants to merge 15 commits intoopea-project:mainfrom
mhbuehler:mmqna-audio-query
Closed

Adds audio querying to MultimodalQ&A gateway#974
mhbuehler wants to merge 15 commits intoopea-project:mainfrom
mhbuehler:mmqna-audio-query

Conversation

@mhbuehler
Copy link
Copy Markdown
Contributor

@mhbuehler mhbuehler commented Dec 4, 2024

Description

Adds ASR endpoint, speech audio processing, prompt construction, and return of decoded audio in response metadata. This goes with GenAIExamples PR: opea-project/GenAIExamples#1225.

Issues

Part of the MultimodalQnA Audio & Image Enhancements RFC

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

N/A

Tests

Automated tests were added to GenAIExamples.

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added in audio dict creation

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* separated audio from prompt

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added ASR endpoint

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* removed ASR endpoints from mm embedding

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* edited return logic, fixed function call

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* added megaservice to elif

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* reworked helper func

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Append audio to prompt

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Reworked handle messages, added metadata

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Moved dictionary logic to right place

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* changed logic to rely on message len

Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* list --> empty str

Signed-off-by: okhleif-IL <omar.khleif@intel.com>
---------

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
okhleif-10 and others added 3 commits December 4, 2024 15:14
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Fixed role bug where enumeration was wrong
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 25 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/mega/gateway.py 66.66% 25 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/mega/gateway.py 31.43% <66.66%> (+3.29%) ⬆️

@ashahba ashahba added this to the v1.2 milestone Dec 4, 2024
Copy link
Copy Markdown
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

mhbuehler and others added 8 commits December 5, 2024 14:45
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Adds unit test coverage for audio query
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
@ashahba ashahba removed the WIP label Dec 6, 2024
Copy link
Copy Markdown
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

return prompt

def convert_audio_to_text(self, audio):
# translate audio to text by passing in dictionary to ASR
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment quirky! dictionary is a data type here but can get mixed with the English word dictionary (word meanings)

else:
input_dict = {"byte_str": audio[0]}

response = requests.post(self.asr_endpoint, data=json.dumps(input_dict), proxies={"http": None})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should proxies be read from some environment variable for a more general solution?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is setting proxies in the first place, shouldn't those be set well before this point?

import requests
from fastapi import Request

os.environ["ASR_SERVICE_PORT"] = "8086"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this overrides environment, instead of taking the value from environment?

@ashahba
Copy link
Copy Markdown
Collaborator

ashahba commented Dec 10, 2024

Labeling this as WIP since we may not need it after all.

@ashahba
Copy link
Copy Markdown
Collaborator

ashahba commented Dec 10, 2024

Looking at new changes in GenAIExamples, it seems like we don't this PR after all since this PR opea-project/GenAIExamples#1225 is self contained.

@ashahba ashahba closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants