Skip to content

Conversation

@iankur
Copy link

@iankur iankur commented Dec 9, 2024

Motivation

This PR adds support for eval on a long context benchmark, InfiniteBench. See #1273 for more context.

Modifications

Following the discussion in #1273, it currently adds code from TensorRT-LLM repo (link) to load the data, create prompts and compute scores. Following are the sample outputs for both cases using gradientai/Llama-3-8B-Instruct-Gradient-1048k with maximum input length of ~130K. Please check readme for more details and instructions on how to run both the benchmarks. Currently, predictions are different (see below) which I will try to fix.

SGLang

{"question_id": 0, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 71432.", "ground_truth": ["71432"]}
{"question_id": 1, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 69079.", "ground_truth": ["69079"]}
{"question_id": 2, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 89415.", "ground_truth": ["89415"]}
{"question_id": 3, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 61734.", "ground_truth": ["61734"]}
{"question_id": 4, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 40204.", "ground_truth": ["40204"]}
{"question_id": 5, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 80723.", "ground_truth": ["80723"]}
{"question_id": 6, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 55058.", "ground_truth": ["55058"]}
{"question_id": 7, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 16783.", "ground_truth": ["16783"]}
{"question_id": 8, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 15951.", "ground_truth": ["15951"]}
{"question_id": 9, "model_id": "gradientai/Llama-3-8B-Instruct-Gradient-1048k", "prediction": " 52933.", "ground_truth": ["52933"]}

TensorRT-LLM

{"id": 0, "prediction": " 71432.", "ground_truth": ["71432"], "input_lengths": [125339]}
{"id": 1, "prediction": " 69079.", "ground_truth": ["69079"], "input_lengths": [125339]}
{"id": 2, "prediction": " 89415.", "ground_truth": ["89415"], "input_lengths": [125339]}
{"id": 3, "prediction": " 61734.", "ground_truth": ["61734"], "input_lengths": [125339]}
{"id": 4, "prediction": " 40204.", "ground_truth": ["40204"], "input_lengths": [125339]}
{"id": 5, "prediction": " 80723.", "ground_truth": ["80723"], "input_lengths": [125339]}
{"id": 6, "prediction": " 55058.", "ground_truth": ["55058"], "input_lengths": [125339]}
{"id": 7, "prediction": " 16783. Remember it", "ground_truth": ["16783"], "input_lengths": [125339]}
{"id": 8, "prediction": " 15951.", "ground_truth": ["15951"], "input_lengths": [125339]}
{"id": 9, "prediction": " 52933.", "ground_truth": ["52933"], "input_lengths": [125339]}

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Copy link
Collaborator

@zhyncs zhyncs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! May we combine these scripts just to one? Something like this

SHAREGPT_URL = "https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json"

Implement the process of downloading files into a script to make it more convenient for users.

@zhyncs
Copy link
Collaborator

zhyncs commented Dec 9, 2024

Additionally, the section about TensorRT LLM is very good! Would you be willing to help improve this custom task script to make it easier to test TensorRT LLM?
https://github.com/sgl-project/sglang/blob/main/test/srt/experiment_runner.py
ref #2407
If considering doing it, it can be implemented in another PR. Thanks!

@zhyncs
Copy link
Collaborator

zhyncs commented Dec 9, 2024

close #1273

@zhyncs zhyncs self-assigned this Dec 9, 2024
@zhyncs
Copy link
Collaborator

zhyncs commented Dec 9, 2024

gradientai/Llama-3-8B-Instruct-Gradient-1048k GradientAI LOL your previous work. cc @michaelfeil

@iankur
Copy link
Author

iankur commented Dec 9, 2024

@zhyncs

Implement the process of downloading files into a script to make it more convenient for users.

Sounds good, I will merge the downloading script for sglang, we can keep the downloading script for tensorrt.

I will also work on the custom task script PR. I am traveling, so it may take some time but will try to do it asap.

)
parser.add_argument("--data-dir", type=str, default="./data")
parser.add_argument("--start-idx", type=int, default=0)
parser.add_argument("--end-idx", type=int, default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add more descriptions about the "--start-idx" and "--end-idx" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these arguments, which were borrowed from tensorrt eval script, and added num-samples with description.

@merrymercy
Copy link
Contributor

Is this ready to be merged?
We can have this first and then add this to CI in the next PR.

@merrymercy
Copy link
Contributor

cc @iankur and @zhyncs . Ready to merge this first part?

python convert_checkpoint.py \
--model_dir ./Llama-3-8B-Instruct-Gradient-1048k/ \
--output_dir /tmp/llama-3-8B-1048k/trt_ckpts \
--dtype float16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the dtype specified in the model's config.json is bfloat16. Could you please explain why float16 is being specified here?

@zhyncs zhyncs closed this May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants