Skip to content

Image ingestion improvements#2

Merged
mhbuehler merged 8 commits intomelanie/mm-rag-enhancedfrom
melanie/combined_image_video_ingestion
Oct 24, 2024
Merged

Image ingestion improvements#2
mhbuehler merged 8 commits intomelanie/mm-rag-enhancedfrom
melanie/combined_image_video_ingestion

Conversation

@mhbuehler
Copy link
Copy Markdown
Owner

@mhbuehler mhbuehler commented Oct 21, 2024

Description

Improves previous image ingestion implementation by reusing the existing /generate_captions endpoint instead of creating a new one, reducing code. I also updated READMEs and tests in both repos. Goes with this PR in GenAIExamples.

Issues

RFC

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

No new dependencies

Tests

  • test_dataprep_multimodal_redis_langchain.sh (updated with new functionality and fixed proxy variables)

Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Copy link
Copy Markdown
Collaborator

@dmsuehir dmsuehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only changes that I would be unsure of are the proxy changes in the test file, because it looks like the tests are run with GHA (for example here), and the docker builds and tests seem to be successful, so it's hard to know if the proxy changes would break anything from GHA runs.

exit 1
else
echo "[ $SERVICE_NAME ] Content is as expected."
fi
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one comment that I have after reviewing Omar's PR - in the test for get_videos (which he's renaming to get_files) it's currently checking RESPONSE_BODY for the video name, but can you add to make it also check for the image name?

Signed-off-by: Melanie Buehler <[email protected]>
Copy link
Copy Markdown
Collaborator

@dmsuehir dmsuehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

mhbuehler and others added 3 commits October 24, 2024 09:21
@mhbuehler mhbuehler merged commit a51f0aa into melanie/mm-rag-enhanced Oct 24, 2024
@mhbuehler mhbuehler deleted the melanie/combined_image_video_ingestion branch October 24, 2024 16:36
mhbuehler pushed a commit that referenced this pull request Apr 28, 2025
)

* Fix image build issue (opea-project#1553)

Signed-off-by: chensuyue <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Unified default port number for the same service in text2graph and text2sql (opea-project#1554)

Signed-off-by: Yao, Qing <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* new: `OpeaArangoDataprep` (#2)

* new: `third_parties/arangodb`

* new: `OpeaArangoDataprep`

* cleanup

* fix: `vllm` instead of `tgi`

* fix: dataprep compsoe

* cleanup

Signed-off-by: Anthony Mahanna <[email protected]>

* new: `OpeaArangoRetriever` (#3)

* new: `OpeaArangoRetriever`

* cleanup

Signed-off-by: Anthony Mahanna <[email protected]>

* new: deps

Signed-off-by: Anthony Mahanna <[email protected]>

* fix typo: `test_retrievers_arango.sh`

Signed-off-by: Anthony Mahanna <[email protected]>

* updated retriever-arango compose file

Signed-off-by: Anthony Mahanna <[email protected]>

* correction

Signed-off-by: Anthony Mahanna <[email protected]>

* add json-repair to dataprep-arango requirements

Signed-off-by: Anthony Mahanna <[email protected]>

* Fix network error, change WORKPATH

Signed-off-by: Anthony Mahanna <[email protected]>

* extra time for health check retriever

Signed-off-by: Anthony Mahanna <[email protected]>

* extended retriever healthcheck 90secs

Signed-off-by: Anthony Mahanna <[email protected]>

* correction

Signed-off-by: Anthony Mahanna <[email protected]>

* Update arangodb.py

Signed-off-by: Anthony Mahanna <[email protected]>

* Removing hugging face token requirement from test file

Signed-off-by: Anthony Mahanna <[email protected]>

* Update test_dataprep_arango with network tests and additional logs

Signed-off-by: Anthony Mahanna <[email protected]>

* Running CI after docker rate limit

Signed-off-by: Anthony Mahanna <[email protected]>

* Base case remove HF_token, no additional tests

Signed-off-by: Anthony Mahanna <[email protected]>

* Adding VLLM check and logs, currently VLLM not working in CI/CD

Signed-off-by: Anthony Mahanna <[email protected]>

* cleanup: compose.yaml

Signed-off-by: Anthony Mahanna <[email protected]>

* update: arangodb healthcheck

Signed-off-by: Anthony Mahanna <[email protected]>

* cleanup

Signed-off-by: Anthony Mahanna <[email protected]>

* cleanup: retriever test

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: typo

Signed-off-by: Anthony Mahanna <[email protected]>

* rem: unused vars

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: indent

Signed-off-by: Anthony Mahanna <[email protected]>

* temp: swap vllm healthcheck with sleep

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: typo

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: component name typo

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: support `EmbedDoc` for retriever

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: `getattr`

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: CURL command

Signed-off-by: Anthony Mahanna <[email protected]>

* revert 6061484

Signed-off-by: Anthony Mahanna <[email protected]>

* Update xtune file and change DDP paramter (opea-project#1552)

Signed-off-by: jilongwa <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* add N/A option (opea-project#1561)

Signed-off-by: ZhangJianyu <[email protected]>
Co-authored-by: ZhangJianyu <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Test latest gaudi docker container (opea-project#1477)

Update base gaudi container into the latest version, docker pull vault.habana.ai/gaudi-docker/1.20.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest, https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Docker_Installation.html#use-intel-gaudi-containers

Signed-off-by: chensuyue <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* fix audioqna male voice setting (opea-project#1559)

Co-authored-by: Letong Han <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* added error handling for lvm (opea-project#1556)

Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* enable mysql db for sql agent (opea-project#1431)

Signed-off-by: cheehook <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Enlarge DocSum prompt buffer (opea-project#1567)

* Enlarge DocSum prompt buffer
Follow PR opea-project#1471

Signed-off-by: XinyaoWa <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Update vLLM parameter max-seq-len-to-capture (opea-project#1565)

Signed-off-by: lvliang-intel <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* fix: lint

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: missing import

Signed-off-by: Anthony Mahanna <[email protected]>

* new: healtcheck for dataprep-arangodb

Signed-off-by: Anthony Mahanna <[email protected]>

* update: arangodb readmes

Signed-off-by: Anthony Mahanna <[email protected]>

* cleanup: test_dataprep_arango.sh

Signed-off-by: Anthony Mahanna <[email protected]>

* cleanup: test_dataprep_arango.sh (PT2)

Signed-off-by: Anthony Mahanna <[email protected]>

* cleanup: test_dataprep_arango.sh (PT3)

Signed-off-by: Anthony Mahanna <[email protected]>

* update: test_dataprep_arango.sh

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: whitespace

Signed-off-by: Anthony Mahanna <[email protected]>

* Remove Transformers versions from requirements.txt file (opea-project#1547)

* Remove Transformers versions from requirements.txt file

Signed-off-by: Abolfazl Shahbazi <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Remove index_names from files for dataprep-get request  (opea-project#1569)

* remove index_names from files fot get request

Signed-off-by: Mustafa <[email protected]>

* update the tests

Signed-off-by: Mustafa <[email protected]>

* update the tests

Signed-off-by: Mustafa <[email protected]>

* update the tests

Signed-off-by: Mustafa <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add validation check for 'all' as an index_name

Signed-off-by: Mustafa <[email protected]>

* fix for readme file

Signed-off-by: Mustafa <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mustafa <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Upgrade Optimum Habana version to fix security check issue (opea-project#1571)

Signed-off-by: lvliang-intel <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Make llamaguard compatible with both TGI and vLLM (opea-project#1581)

Signed-off-by: lvliang-intel <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Fix Dockerfile error and add CI test for IPEX (opea-project#1585)

* Fix Dockerfile error and add CI teat

Signed-off-by: lvliang-intel <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Reduce multilang tts docker image size (opea-project#1587)

* fix audioqna male voice setting

* reduce multilang tts docker image size

Signed-off-by: Anthony Mahanna <[email protected]>

* unset OPENAI_KEY in CI test (opea-project#1586)

Signed-off-by: Rita Brugarolas <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* Add AWS Credentials for CD test (opea-project#1588)

* Fix CD test issue

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* update: shorten ingest_dataprep.txt

Signed-off-by: Anthony Mahanna <[email protected]>

* revert: a4d943e

Signed-off-by: Anthony Mahanna <[email protected]>

* new: `DataprepRequest` model (opea-project#1525)

* new: `DataprepRequest`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: docstrings

* rem: `ingest_from_graphDB`

* new: dep injection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: verbose `input` processing

* attempt: replace `kwargs` with params

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rem: `db_type`

ref: opea-project#1525 (comment)

* attempt: require `base`

* Revert "attempt: require `base`"

This reverts commit 620ca6b.

* new: `DataprepRequest`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: docstrings

* rem: `ingest_from_graphDB`

* new: dep injection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: verbose `input` processing

* attempt: replace `kwargs` with params

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rem: `db_type`

ref: opea-project#1525 (comment)

* attempt: require `base`

* Revert "attempt: require `base`"

This reverts commit 620ca6b.

* Fix dataprep request class issue of Redis (#1)

* new: `DataprepRequest`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: docstrings

* rem: `ingest_from_graphDB`

* new: dep injection

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: verbose `input` processing

* attempt: replace `kwargs` with params

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rem: `db_type`

ref: opea-project#1525 (comment)

* attempt: require `base`

* Revert "attempt: require `base`"

This reverts commit 620ca6b.

* fix dataprep request class of redis

Signed-off-by: letonghan <[email protected]>

* revert change in redis.py

Signed-off-by: letonghan <[email protected]>

---------

Signed-off-by: letonghan <[email protected]>
Co-authored-by: Anthony Mahanna <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Anthony Mahanna <[email protected]>
Co-authored-by: Liang Lv <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert: `DataprepRequest` for multimodal

* revert: `DataprepRequest` for multimodal (PT2)

* fix: conditionally fetch unique `DataprepRequest` attributes

* fix bugs in dataprep util script

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert change of pgvector

Signed-off-by: letonghan <[email protected]>

* fix indices bug for redis

Signed-off-by: letonghan <[email protected]>

* minor fix for redis

Signed-off-by: letonghan <[email protected]>

* ingest file into rag_redis_test

Signed-off-by: letonghan <[email protected]>

* update indice name

Signed-off-by: letonghan <[email protected]>

---------

Signed-off-by: letonghan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Liang Lv <[email protected]>
Co-authored-by: Letong Han <[email protected]>
Co-authored-by: letonghan <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>

* revert: bc4445c

Signed-off-by: Anthony Mahanna <[email protected]>

* revert: d17f6aa

Signed-off-by: Anthony Mahanna <[email protected]>

* Revert "new: `DataprepRequest` model (opea-project#1525)" (opea-project#1592)

This reverts commit 88947ab.

Signed-off-by: Anthony Mahanna <[email protected]>

* add hyperlinks

Signed-off-by: Anthony Mahanna <[email protected]>

* revert: 4eb9ec4f

Signed-off-by: Anthony Mahanna <[email protected]>

* new: ArangoDBDataprepRequest

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: lint

Signed-off-by: Anthony Mahanna <[email protected]>

* cleanup: delete_files

Signed-off-by: Anthony Mahanna <[email protected]>

* remove: env mutation

Signed-off-by: Anthony Mahanna <[email protected]>

* fix: move openai key env var to top of file

Signed-off-by: Anthony Mahanna <[email protected]>

---------

Signed-off-by: chensuyue <[email protected]>
Signed-off-by: Anthony Mahanna <[email protected]>
Signed-off-by: Yao, Qing <[email protected]>
Signed-off-by: jilongwa <[email protected]>
Signed-off-by: ZhangJianyu <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: cheehook <[email protected]>
Signed-off-by: XinyaoWa <[email protected]>
Signed-off-by: lvliang-intel <[email protected]>
Signed-off-by: Abolfazl Shahbazi <[email protected]>
Signed-off-by: Mustafa <[email protected]>
Signed-off-by: Rita Brugarolas <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: letonghan <[email protected]>
Co-authored-by: chen, suyue <[email protected]>
Co-authored-by: Yao Qing <[email protected]>
Co-authored-by: lasyasn <[email protected]>
Co-authored-by: Ajay Kallepalli <[email protected]>
Co-authored-by: jilongW <[email protected]>
Co-authored-by: Neo Zhang Jianyu <[email protected]>
Co-authored-by: ZhangJianyu <[email protected]>
Co-authored-by: Spycsh <[email protected]>
Co-authored-by: Letong Han <[email protected]>
Co-authored-by: Omar Khleif <[email protected]>
Co-authored-by: cheehook <[email protected]>
Co-authored-by: XinyaoWa <[email protected]>
Co-authored-by: Liang Lv <[email protected]>
Co-authored-by: Abolfazl Shahbazi <[email protected]>
Co-authored-by: Mustafa <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rbrugaro <[email protected]>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: letonghan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants