Adds audio querying to MultimodalQ&A gateway#974
Adds audio querying to MultimodalQ&A gateway#974mhbuehler wants to merge 15 commits intoopea-project:mainfrom
Conversation
Signed-off-by: okhleif-IL <omar.khleif@intel.com> * added in audio dict creation Signed-off-by: okhleif-IL <omar.khleif@intel.com> * separated audio from prompt Signed-off-by: okhleif-IL <omar.khleif@intel.com> * added ASR endpoint Signed-off-by: okhleif-IL <omar.khleif@intel.com> * removed ASR endpoints from mm embedding Signed-off-by: okhleif-IL <omar.khleif@intel.com> * edited return logic, fixed function call Signed-off-by: okhleif-IL <omar.khleif@intel.com> * added megaservice to elif Signed-off-by: okhleif-IL <omar.khleif@intel.com> * reworked helper func Signed-off-by: okhleif-IL <omar.khleif@intel.com> * Append audio to prompt Signed-off-by: okhleif-IL <omar.khleif@intel.com> * Reworked handle messages, added metadata Signed-off-by: okhleif-IL <omar.khleif@intel.com> * Moved dictionary logic to right place Signed-off-by: okhleif-IL <omar.khleif@intel.com> * changed logic to rely on message len Signed-off-by: okhleif-IL <omar.khleif@intel.com> * list --> empty str Signed-off-by: okhleif-IL <omar.khleif@intel.com> --------- Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com> Signed-off-by: okhleif-IL <omar.khleif@intel.com> Signed-off-by: dmsuehir <dina.s.jones@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Signed-off-by: okhleif-IL <omar.khleif@intel.com>
Fixed role bug where enumeration was wrong
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
|
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Adds unit test coverage for audio query
for more information, see https://pre-commit.ci
Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>
Fix port number placement
| return prompt | ||
|
|
||
| def convert_audio_to_text(self, audio): | ||
| # translate audio to text by passing in dictionary to ASR |
There was a problem hiding this comment.
comment quirky! dictionary is a data type here but can get mixed with the English word dictionary (word meanings)
| else: | ||
| input_dict = {"byte_str": audio[0]} | ||
|
|
||
| response = requests.post(self.asr_endpoint, data=json.dumps(input_dict), proxies={"http": None}) |
There was a problem hiding this comment.
should proxies be read from some environment variable for a more general solution?
There was a problem hiding this comment.
Why this is setting proxies in the first place, shouldn't those be set well before this point?
| import requests | ||
| from fastapi import Request | ||
|
|
||
| os.environ["ASR_SERVICE_PORT"] = "8086" |
There was a problem hiding this comment.
Why this overrides environment, instead of taking the value from environment?
|
Labeling this as |
|
Looking at new changes in |
Description
Adds ASR endpoint, speech audio processing, prompt construction, and return of decoded audio in response metadata. This goes with GenAIExamples PR: opea-project/GenAIExamples#1225.
Issues
Part of the MultimodalQnA Audio & Image Enhancements RFC
Type of change
Dependencies
N/A
Tests
Automated tests were added to GenAIExamples.