perplexity: update README FP16 results [no ci]#7413
perplexity: update README FP16 results [no ci]#7413JohannesGaessler merged 1 commit intoggml-org:masterfrom
Conversation
|
@JohannesGaessler great work, plans to add also the scoreboard for llama3-70b? It would be very useful to compare the trend in perplexity loss of llama2-70b and llama-3-70b |
|
I'm hesitant to publish anything with LLaMA 3 70b because it turned out that the machine I built with 6x RTX 4090 has stability issues which means I have to be very careful that the data isn't being affected by random bit flips. |
:-( how much time does it take to run all the experiments for llama3 70b (more or less)? Just to understand how much it would cost. |
|
At standard settings a single LLaMA 3 70b run takes ~6 minutes on 6x RTX 4090. |
…alues Uses the values computed by @JohannesGaessler in PR ggml-org#7413
…alues (#8058) Uses the values computed by @JohannesGaessler in PR #7413
The logits used for comparative runs of
perplexityare stored asuint16_tinstead offloat. The difference from this downcasting can be non-negligible when looking at quants like q8_0 or q6_K. This PR adds a disclaimer and results to estimate the impact of the downcasting.