Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions MultimodalQnA/README.md
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please view this README in a separate tab. Let me know if I should fix the screenshot sizes

Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,29 @@ docker compose -f compose.yaml up -d

## MultimodalQnA Demo on Gaudi2

![MultimodalQnA-upload-waiting-screenshot](./assets/img/upload-gen-trans.png)
### Multimodal QnA UI
![MultimodalQnA-ui-screenshot](./assets/img/mmqna-ui.png)

![MultimodalQnA-upload-done-screenshot](./assets/img/upload-gen-captions.png)
### Video Ingestion
![MultimodalQnA-ingest-video-screenshot](./assets/img/video-ingestion.png)

![MultimodalQnA-query-example-screenshot](./assets/img/example_query.png)
### Text Query following the ingestion of a Video
![MultimodalQnA-video-query-screenshot](./assets/img/video-query.png)

### Image Ingestion
![MultimodalQnA-ingest-image-screenshot](./assets/img/image-ingestion.png)

### Text Query following the ingestion of an image
![MultimodalQnA-video-query-screenshot](./assets/img/image-query.png)

### Audio Ingestion
![MultimodalQnA-audio-ingestion-screenshot](./assets/img/audio-ingestion.png)

### Text Query following the ingestion of an Audio Podcast
![MultimodalQnA-audio-query-screenshot](./assets/img/audio-query.png)

### PDF Ingestion
![MultimodalQnA-upload-pdf-screenshot](./assets/img/ingest_pdf.png)

### Text query following the ingestion of a PDF
![MultimodalQnA-pdf-query-example-screenshot](./assets/img/pdf-query.png)
Binary file added MultimodalQnA/assets/img/audio-ingestion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/audio-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/image-ingestion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/image-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/ingest_pdf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/mmqna-ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/pdf-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/video-ingestion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added MultimodalQnA/assets/img/video-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 8 additions & 8 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ lvm
===
Port 9399 - Open to 0.0.0.0/0

whisper
===
port 7066 - Open to 0.0.0.0/0

dataprep-multimodal-redis
===
Port 6007 - Open to 0.0.0.0/0
Expand Down Expand Up @@ -83,10 +87,6 @@ export WHISPER_PORT=7066
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
export WHISPER_MODEL="base"
export MAX_IMAGES=1
export ASR_ENDPOINT=http://$host_ip:$WHISPER_PORT
export ASR_PORT=9099
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
Expand Down Expand Up @@ -164,7 +164,7 @@ docker build --no-cache -t opea/lvm:latest --build-arg https_proxy=$https_proxy
docker build --no-cache -t opea/dataprep-multimodal-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimodal/redis/langchain/Dockerfile .
```

### 5. Build asr images
### 5. Build Whisper Server Image

Build whisper server image

Expand Down Expand Up @@ -270,7 +270,7 @@ curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/multimodal_retrieval \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
```

4. asr
4. whisper

```bash
curl ${WHISPER_SERVER_ENDPOINT} \
Expand Down Expand Up @@ -406,15 +406,15 @@ curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
Test the MegaService with an audio query:

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
```

Test the MegaService with a text and image query:

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "Green bananas in a tree"}, {"type": "image_url", "image_url": {"url": "http://images.cocodataset.org/test-stuff2017/000000004248.jpg"}}]}]}'
```
Expand Down
15 changes: 0 additions & 15 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,6 @@ services:
https_proxy: ${https_proxy}
WHISPER_PORT: ${WHISPER_PORT}
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "${ASR_SERVICE_PORT}:${ASR_PORT}"
ipc: host
environment:
WHISPER_PORT: ${WHISPER_PORT}
MAX_IMAGES: ${MAX_IMAGES}
ASR_PORT: ${ASR_PORT}
ASR_ENDPOINT: ${ASR_ENDPOINT}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
Expand Down Expand Up @@ -162,8 +149,6 @@ services:
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
LVM_MODEL_ID: ${LVM_MODEL_ID}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
WHISPER_PORT: ${WHISPER_PORT}
WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT}
ipc: host
Expand Down
4 changes: 0 additions & 4 deletions MultimodalQnA/docker_compose/intel/cpu/xeon/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,6 @@ export WHISPER_PORT=7066
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
export WHISPER_MODEL="base"
export MAX_IMAGES=1
export ASR_ENDPOINT=http://$host_ip:$WHISPER_PORT
export ASR_PORT=9099
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"

export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
Expand Down
12 changes: 4 additions & 8 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,6 @@ export WHISPER_PORT=7066
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
export MAX_IMAGES=1
export WHISPER_MODEL="base"
export ASR_ENDPOINT=http://$host_ip:$WHISPER_PORT
export ASR_PORT=9099
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export DATAPREP_MMR_PORT=6007
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/ingest_with_text"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/generate_transcripts"
Expand Down Expand Up @@ -116,7 +112,7 @@ docker build --no-cache -t opea/lvm:latest --build-arg https_proxy=$https_proxy
docker build --no-cache -t opea/dataprep-multimodal-redis:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/multimodal/redis/langchain/Dockerfile .
```

### 5. Build asr images
### 5. Build Whisper Server Image

Build whisper server image

Expand Down Expand Up @@ -220,7 +216,7 @@ curl http://${host_ip}:7000/v1/multimodal_retrieval \
-d "{\"text\":\"test\",\"embedding\":${your_embedding}}"
```

4. asr
4. whisper

```bash
curl ${WHISPER_SERVER_ENDPOINT} \
Expand Down Expand Up @@ -356,15 +352,15 @@ curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
Test the MegaService with an audio query:

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'
```

Test the MegaService with a text and image query:

```bash
curl http://${host_ip}:8888/v1/multimodalqna \
curl http://${host_ip}:${MEGA_SERVICE_PORT}/v1/multimodalqna \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": [{"type": "text", "text": "Green bananas in a tree"}, {"type": "image_url", "image_url": {"url": "http://images.cocodataset.org/test-stuff2017/000000004248.jpg"}}]}]}'
```
Expand Down
15 changes: 0 additions & 15 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,6 @@ services:
WHISPER_PORT: ${WHISPER_PORT}
WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT}
restart: unless-stopped
asr:
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
container_name: asr-service
ports:
- "${ASR_SERVICE_PORT}:${ASR_PORT}"
ipc: host
environment:
WHISPER_PORT: ${WHISPER_PORT}
MAX_IMAGES: ${MAX_IMAGES}
ASR_PORT: ${ASR_PORT}
ASR_ENDPOINT: ${ASR_ENDPOINT}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
dataprep-multimodal-redis:
image: ${REGISTRY:-opea}/dataprep-multimodal-redis:${TAG:-latest}
container_name: dataprep-multimodal-redis
Expand Down Expand Up @@ -181,8 +168,6 @@ services:
MM_RETRIEVER_SERVICE_HOST_IP: ${MM_RETRIEVER_SERVICE_HOST_IP}
LVM_SERVICE_HOST_IP: ${LVM_SERVICE_HOST_IP}
LVM_MODEL_ID: ${LVM_MODEL_ID}
ASR_SERVICE_PORT: ${ASR_SERVICE_PORT}
ASR_SERVICE_ENDPOINT: ${ASR_SERVICE_ENDPOINT}
WHISPER_PORT: ${WHISPER_PORT}
WHISPER_SERVER_ENDPOINT: ${WHISPER_SERVER_ENDPOINT}
ipc: host
Expand Down
6 changes: 1 addition & 5 deletions MultimodalQnA/docker_compose/intel/hpu/gaudi/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,10 @@ export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
export REDIS_HOST=${host_ip}
export INDEX_NAME="mm-rag-redis"

export WHISPER_MODEL="base"
export WHISPER_PORT=7066
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
export MAX_IMAGES=1
export WHISPER_MODEL="base"
export ASR_ENDPOINT=http://$host_ip:$WHISPER_PORT
export ASR_PORT=9099
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"

export DATAPREP_MMR_PORT=6007
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/ingest_with_text"
Expand Down
6 changes: 0 additions & 6 deletions MultimodalQnA/docker_image_build/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,3 @@ services:
dockerfile: comps/asr/src/integrations/dependency/whisper/Dockerfile
extends: multimodalqna
image: ${REGISTRY:-opea}/whisper:${TAG:-latest}
asr:
build:
context: GenAIComps
dockerfile: comps/asr/src/Dockerfile
extends: multimodalqna
image: ${REGISTRY:-opea}/asr:${TAG:-latest}
4 changes: 0 additions & 4 deletions MultimodalQnA/tests/test_compose_on_gaudi.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,6 @@ function setup_env() {
export MAX_IMAGES=1
export WHISPER_MODEL="base"
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
export ASR_ENDPOINT=http://$host_ip:$WHISPER_PORT
export ASR_PORT=9099
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export DATAPREP_MMR_PORT=6007
export DATAPREP_INGEST_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/ingest_with_text"
export DATAPREP_GEN_TRANSCRIPT_SERVICE_ENDPOINT="http://${host_ip}:${DATAPREP_MMR_PORT}/v1/generate_transcripts"
Expand Down
4 changes: 0 additions & 4 deletions MultimodalQnA/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,6 @@ function setup_env() {
export MAX_IMAGES=1
export WHISPER_MODEL="base"
export WHISPER_SERVER_ENDPOINT="http://${host_ip}:${WHISPER_PORT}/v1/asr"
export ASR_ENDPOINT=http://$host_ip:$WHISPER_PORT
export ASR_PORT=9099
export ASR_SERVICE_PORT=3001
export ASR_SERVICE_ENDPOINT="http://${host_ip}:${ASR_SERVICE_PORT}/v1/audio/transcriptions"
export REDIS_DB_PORT=6379
export REDIS_INSIGHTS_PORT=8001
export REDIS_URL="redis://${host_ip}:${REDIS_DB_PORT}"
Expand Down