Skip to content

Conversation

@sahilsuneja1
Copy link
Contributor

Using benchmark_mixtral_moe.py from #2979 to tune a fused moe kernel config for TP2 A100

Latency measurements using: python benchmarks/benchmark_latency.py --model=mistralai/Mixtral-8x7B-Instruct-v0.1 --input-len 1000 --output-len 50 -tp 2 --num-iters 100 --batch-size <bs>:

This PR:

BS: 1, Avg latency: 0.7621450612053741 seconds
BS: 2, Avg latency: 1.0328856136795366 seconds
BS: 4, Avg latency: 1.4489717756788014 seconds
BS: 8, Avg latency: 2.0341408042260447 seconds
BS: 16, Avg latency: 2.893355064672651 seconds
BS: 32, Avg latency: 4.530912061399431 seconds
BS: 64, Avg latency: 7.537396909691161 seconds

Compared to master:

BS: 1, Avg latency: 0.8453641083685216 seconds
BS: 2, Avg latency: 1.1280082573764958 seconds
BS: 4, Avg latency: 1.6140852882619947 seconds
BS: 8, Avg latency: 2.348028304380132 seconds
BS: 16, Avg latency: 3.5489811494306194 seconds
BS: 32, Avg latency: 5.627054951939499 seconds
BS: 64, Avg latency: 9.691197272467543 seconds

@njhill @pcmoritz

@Yard1 Yard1 requested a review from pcmoritz March 7, 2024 18:03
Copy link
Collaborator

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding this!

@simon-mo
Copy link
Collaborator

Closing as stale. Looks like there's already a JSON in place.

@simon-mo simon-mo closed this Oct 22, 2024
@njhill
Copy link
Member

njhill commented Oct 23, 2024

@simon-mo actually the JSON there is for TP4 ... I didn't realize that this never got merged 😅 .. I'll re-open and maybe we can add it...

@njhill njhill reopened this Oct 23, 2024
@njhill
Copy link
Member

njhill commented Oct 23, 2024

Oh my bad I was looking at wrong fork 🤦

@njhill njhill closed this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants