Skip to content

Conversation

@hiro-v
Copy link
Contributor

@hiro-v hiro-v commented Dec 7, 2023

For #821

Integration diagram
Image

NVIDIA triton inference server and TensorRT LLM setup

@hiro-v hiro-v added this to the v0.5.0 milestone Dec 7, 2023
@hiro-v hiro-v requested a review from louis-jan December 7, 2023 02:11
@hiro-v hiro-v self-assigned this Dec 7, 2023
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 7, 2023

  1. Check if extension installed - See Inference Triton Trt Llm Extension - v 1.0.0
    CleanShot 2023-12-07 at 09 07 41

  2. Find model in Hub
    CleanShot 2023-12-07 at 09 07 53

  3. Update ~/jan/engines/triton_trtllm.json with base_url as remote public/ private IP
    CleanShot 2023-12-07 at 09 16 08

  4. Chat with the remote llama2 7b model on remote NVIDIA Triton inference server
    CleanShot 2023-12-07 at 09 09 45

@hiro-v hiro-v marked this pull request as draft December 7, 2023 02:16
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 7, 2023

There is a current blockage is the error on Triton inference server + trt llm with missing space character: triton-inference-server/tensorrtllm_backend#34
I will check the answer to follow up.

@louis-jan louis-jan force-pushed the feat/inference_engines branch from 481abc4 to 774b122 Compare December 7, 2023 03:36
@hiro-v hiro-v force-pushed the feat/inference_engines branch from 774b122 to d7c0d97 Compare December 7, 2023 08:23
@freelerobot
Copy link
Contributor

What's the rationale for having both inference-extension (which just has the nitro binary) and inference-nitro-extension?

@hiro-v hiro-v force-pushed the feat/inference_engines branch from d29ef17 to f9e73b0 Compare December 8, 2023 16:15
Base automatically changed from feat/inference_engines to main December 8, 2023 18:09
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 10, 2023

No, the inference-extension has been removed @0xSage

@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch from 194132d to fc8057b Compare December 10, 2023 13:31
@hiro-v hiro-v requested review from a team and removed request for louis-jan December 10, 2023 13:31
@dan-menlo dan-menlo modified the milestones: 0.4.1, 0.4.2, API Endpoint at localhost:1337, Jan supports multiple Inference Engines Dec 11, 2023
@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch 2 times, most recently from 0cd4106 to 4054d77 Compare December 12, 2023 01:19
@hiro-v hiro-v marked this pull request as ready for review December 12, 2023 01:45
@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch 3 times, most recently from 06d46cb to f26a8d8 Compare December 12, 2023 07:30
Copy link
Contributor

@tikikun tikikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hiro-v hiro-v force-pushed the feat/inference_engine_triton_trtllm branch from f26a8d8 to 587f5ad Compare December 12, 2023 18:28
@hiro-v hiro-v merged commit 9256505 into main Dec 12, 2023
@hiro-v hiro-v deleted the feat/inference_engine_triton_trtllm branch December 12, 2023 18:29
@Van-QA Van-QA added this to the v0.4.9 milestone Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants