[SYCL] refactor by airMeng · Pull Request #6408 · ggml-org/llama.cpp

airMeng · 2024-03-31T10:54:09Z

according to #5277 (reply in thread), the PR does the following:

separate dpct generated headers for future maintaining
separate GEMM related operators for future template-based library introduction , AKA XeTLA
~~- [ ] let the common backend to handle H2D/D2H memcpy.~~ let the PR as simple as possible

airMeng · 2024-03-31T10:58:21Z

@slaren Since we can put SYCL related code under a directory instead of a single file, I might introduce headers-only library for performance optimization, as well as simplifying our effort too (my job during work time 😁 )

@ggerganov @mingfeima for aware

github-actions · 2024-03-31T11:14:48Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 504 iterations 🚀

Concurrent users: 8, duration: 10m
HTTP request : avg=9274.74ms p(90)=26479.05ms fails=0, finish reason: stop=504 truncated=0
Prompt processing (pp): avg=241.61tk/s p(90)=732.4tk/s total=200.65tk/s
Token generation (tg): avg=102.96tk/s p(90)=278.78tk/s total=129.75tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sycl-refactor commit=a2e77e60d6d1e208096aae27e24a23ff9821c58b

Time series

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 342.78, 342.78, 342.78, 342.78, 342.78, 719.03, 719.03, 719.03, 719.03, 719.03, 746.58, 746.58, 746.58, 746.58, 746.58, 764.11, 764.11, 764.11, 764.11, 764.11, 801.5, 801.5, 801.5, 801.5, 801.5, 801.28, 801.28, 801.28, 801.28, 801.28, 795.7, 795.7, 795.7, 795.7, 795.7, 775.96, 775.96, 775.96, 775.96, 775.96, 772.84, 772.84, 772.84, 772.84, 772.84, 776.85, 776.85, 776.85, 776.85, 776.85, 770.35, 770.35, 770.35, 770.35, 770.35, 774.25, 774.25, 774.25, 774.25, 774.25, 772.86, 772.86, 772.86, 772.86, 772.86, 785.43, 785.43, 785.43, 785.43, 785.43, 811.79, 811.79, 811.79, 811.79, 811.79, 757.51, 757.51, 757.51, 757.51, 757.51, 758.3, 758.3, 758.3, 758.3, 758.3, 755.16, 755.16, 755.16, 755.16, 755.16, 760.58, 760.58, 760.58, 760.58, 760.58, 757.26, 757.26, 757.26, 757.26, 757.26, 754.56, 754.56, 754.56, 754.56, 754.56, 753.07, 753.07, 753.07, 753.07, 753.07, 754.99, 754.99, 754.99, 754.99, 754.99, 754.33, 754.33, 754.33, 754.33, 754.33, 748.26, 748.26, 748.26, 748.26, 748.26, 754.61, 754.61, 754.61, 754.61, 754.61, 750.37, 750.37, 750.37, 750.37, 750.37, 749.32, 749.32, 749.32, 749.32, 749.32, 754.07, 754.07, 754.07, 754.07, 754.07, 751.72, 751.72, 751.72, 751.72, 751.72, 750.1, 750.1, 750.1, 750.1, 750.1, 750.7, 750.7, 750.7, 750.7, 750.7, 751.0, 751.0, 751.0, 751.0, 751.0, 749.45, 749.45, 749.45, 749.45, 749.45, 751.5, 751.5, 751.5, 751.5, 751.5, 761.28, 761.28, 761.28, 761.28, 761.28, 763.73, 763.73, 763.73, 763.73, 763.73, 764.15, 764.15, 764.15, 764.15, 764.15, 766.85, 766.85, 766.85, 766.85, 766.85, 764.68, 764.68, 764.68, 764.68, 764.68, 763.52, 763.52, 763.52, 763.52, 763.52, 748.97, 748.97, 748.97, 748.97, 748.97, 755.98, 755.98, 755.98, 755.98, 755.98, 731.38, 731.38, 731.38, 731.38, 731.38, 727.87, 727.87, 727.87, 727.87, 727.87, 727.34, 727.34, 727.34, 727.34, 727.34, 725.19, 725.19, 725.19, 725.19, 725.19, 723.16, 723.16, 723.16, 723.16, 723.16, 720.23, 720.23, 720.23, 720.23, 720.23, 721.25, 721.25, 721.25, 721.25, 721.25, 725.33, 725.33, 725.33, 725.33, 725.33, 725.09, 725.09, 725.09, 725.09, 725.09, 724.9, 724.9, 724.9, 724.9, 724.9, 727.62, 727.62, 727.62, 727.62, 727.62, 730.84, 730.84, 730.84, 730.84, 730.84, 731.61, 731.61, 731.61, 731.61, 731.61, 731.14, 731.14, 731.14, 731.14, 731.14, 730.26, 730.26, 730.26, 730.26, 730.26, 732.05, 732.05, 732.05, 732.05, 732.05, 732.57, 732.57, 732.57, 732.57, 732.57, 731.56, 731.56, 731.56]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.19, 33.19, 33.19, 33.19, 33.19, 26.34, 26.34, 26.34, 26.34, 26.34, 16.98, 16.98, 16.98, 16.98, 16.98, 17.33, 17.33, 17.33, 17.33, 17.33, 17.51, 17.51, 17.51, 17.51, 17.51, 18.31, 18.31, 18.31, 18.31, 18.31, 19.13, 19.13, 19.13, 19.13, 19.13, 19.71, 19.71, 19.71, 19.71, 19.71, 19.78, 19.78, 19.78, 19.78, 19.78, 19.86, 19.86, 19.86, 19.86, 19.86, 19.79, 19.79, 19.79, 19.79, 19.79, 19.71, 19.71, 19.71, 19.71, 19.71, 19.39, 19.39, 19.39, 19.39, 19.39, 19.13, 19.13, 19.13, 19.13, 19.13, 18.87, 18.87, 18.87, 18.87, 18.87, 18.25, 18.25, 18.25, 18.25, 18.25, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.14, 18.28, 18.28, 18.28, 18.28, 18.28, 18.18, 18.18, 18.18, 18.18, 18.18, 18.0, 18.0, 18.0, 18.0, 18.0, 17.97, 17.97, 17.97, 17.97, 17.97, 17.84, 17.84, 17.84, 17.84, 17.84, 17.82, 17.82, 17.82, 17.82, 17.82, 17.89, 17.89, 17.89, 17.89, 17.89, 17.93, 17.93, 17.93, 17.93, 17.93, 17.86, 17.86, 17.86, 17.86, 17.86, 17.87, 17.87, 17.87, 17.87, 17.87, 17.94, 17.94, 17.94, 17.94, 17.94, 17.9, 17.9, 17.9, 17.9, 17.9, 17.85, 17.85, 17.85, 17.85, 17.85, 17.99, 17.99, 17.99, 17.99, 17.99, 18.08, 18.08, 18.08, 18.08, 18.08, 18.19, 18.19, 18.19, 18.19, 18.19, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.29, 18.3, 18.3, 18.3, 18.3, 18.3, 18.29, 18.29, 18.29, 18.29, 18.29, 18.19, 18.19, 18.19, 18.19, 18.19, 18.18, 18.18, 18.18, 18.18, 18.18, 18.22, 18.22, 18.22, 18.22, 18.22, 18.33, 18.33, 18.33, 18.33, 18.33, 18.37, 18.37, 18.37, 18.37, 18.37, 18.32, 18.32, 18.32, 18.32, 18.32, 18.28, 18.28, 18.28, 18.28, 18.28, 18.2, 18.2, 18.2, 18.2, 18.2, 17.98, 17.98, 17.98, 17.98, 17.98, 17.77, 17.77, 17.77, 17.77, 17.77, 17.51, 17.51, 17.51, 17.51, 17.51, 17.35, 17.35, 17.35, 17.35, 17.35, 17.33, 17.33, 17.33, 17.33, 17.33, 17.37, 17.37, 17.37, 17.37, 17.37, 17.44, 17.44, 17.44, 17.44, 17.44, 17.46, 17.46, 17.46, 17.46, 17.46, 17.5, 17.5, 17.5, 17.5, 17.5, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.53, 17.5, 17.5, 17.5, 17.5, 17.5, 17.48, 17.48, 17.48, 17.48, 17.48, 17.46, 17.46, 17.46, 17.46, 17.46, 17.52, 17.52, 17.52]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.09, 0.09, 0.09, 0.09, 0.09, 0.25, 0.25, 0.25, 0.25, 0.25, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.08, 0.08, 0.08, 0.08, 0.08, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.27, 0.27, 0.27, 0.27, 0.27, 0.21, 0.21, 0.21, 0.21, 0.21, 0.29, 0.29, 0.29, 0.29, 0.29, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.26, 0.26, 0.26, 0.26, 0.26, 0.24, 0.24, 0.24, 0.24, 0.24, 0.31, 0.31, 0.31, 0.31, 0.31, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.3, 0.3, 0.3, 0.3, 0.3, 0.22, 0.22, 0.22, 0.22, 0.22, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.07, 0.07, 0.07, 0.07, 0.07, 0.25, 0.25, 0.25, 0.25, 0.25, 0.41, 0.41, 0.41, 0.41, 0.41, 0.46, 0.46, 0.46, 0.46, 0.46, 0.5, 0.5, 0.5, 0.5, 0.5, 0.52, 0.52, 0.52, 0.52, 0.52, 0.54, 0.54, 0.54, 0.54, 0.54, 0.3, 0.3, 0.3, 0.3, 0.3, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.26, 0.26, 0.26, 0.26, 0.26, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 504 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1711883059 --> 1711883683
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0]

slaren · 2024-03-31T19:56:54Z

@slaren Since we can put SYCL related code under a directory instead of a single file, I might introduce headers-only library for performance optimization, as well as simplifying our effort too (my job during work time 😁 )

I think that's good, I plan to start using CUTLASS in the CUDA backend as well.

NeoZhangJianyu · 2024-04-01T02:10:12Z

It's great to see the new structure.
Current SYCL backend has bugs which impact IQ3 model and UT pass rate is dropped too.
I'm working to fix them now.
Is it possible to wait for my fix?

abhilash1910 · 2024-04-01T04:21:37Z

This is a good refactoring, and would be helpful for debug . I would suggest waiting for some iq quant prs and then resume work on this.

airMeng · 2024-04-01T04:47:44Z

This is a good refactoring, and would be helpful for debug . I would suggest waiting for some iq quant prs and then resume work on this.

It's great to see the new structure. Current SYCL backend has bugs which impact IQ3 model and UT pass rate is dropped too. I'm working to fix them now. Is it possible to wait for my fix?

yes, drop a note when you finished.

NeoZhangJianyu · 2024-04-07T03:04:01Z

@airMeng
All IQ types in this PR are supported/fixed by #6521.
You could continue your work now.

Thank you!

airMeng · 2024-05-05T13:56:18Z

@NeoZhangJianyu @abhilash1910

ggml-sycl/common.hpp

ggml-sycl.cpp

NeoZhangJianyu · 2024-05-06T00:41:33Z

Build with fp16 is fault, please check and fix.
Please run ci/run.sh to make sure the quality not be reduced.

NeoZhangJianyu · 2024-05-06T08:55:29Z

for sub folder: dpct
I suggest not to use folder for "dpct". save them to two file in ggml-sycl foder, like dpct-helper.cpp/hpp.

There won't be more files in dpct part. no need to add a subfolder for 2 files.
The dpct files are updated for llama.cpp requirement manfully.
save them to dpct folder, will make other think it's copied from dcpt directly.

github-actions · 2024-05-22T06:12:37Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 545 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8562.08ms p(95)=21241.45ms fails=, finish reason: stop=483 truncated=62
Prompt processing (pp): avg=100.34tk/s p(95)=436.58tk/s
Token generation (tg): avg=34.62tk/s p(95)=48.49tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sycl-refactor commit=50dffa13d8f947a077a03478aaf26dc70bdc7ecd

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 260.58, 260.58, 260.58, 260.58, 260.58, 529.01, 529.01, 529.01, 529.01, 529.01, 580.94, 580.94, 580.94, 580.94, 580.94, 604.47, 604.47, 604.47, 604.47, 604.47, 666.81, 666.81, 666.81, 666.81, 666.81, 702.25, 702.25, 702.25, 702.25, 702.25, 706.59, 706.59, 706.59, 706.59, 706.59, 722.98, 722.98, 722.98, 722.98, 722.98, 734.39, 734.39, 734.39, 734.39, 734.39, 748.38, 748.38, 748.38, 748.38, 748.38, 751.73, 751.73, 751.73, 751.73, 751.73, 776.2, 776.2, 776.2, 776.2, 776.2, 809.65, 809.65, 809.65, 809.65, 809.65, 826.3, 826.3, 826.3, 826.3, 826.3, 799.07, 799.07, 799.07, 799.07, 799.07, 805.11, 805.11, 805.11, 805.11, 805.11, 806.56, 806.56, 806.56, 806.56, 806.56, 828.54, 828.54, 828.54, 828.54, 828.54, 826.27, 826.27, 826.27, 826.27, 826.27, 828.39, 828.39, 828.39, 828.39, 828.39, 835.14, 835.14, 835.14, 835.14, 835.14, 837.46, 837.46, 837.46, 837.46, 837.46, 839.34, 839.34, 839.34, 839.34, 839.34, 833.72, 833.72, 833.72, 833.72, 833.72, 835.88, 835.88, 835.88, 835.88, 835.88, 833.6, 833.6, 833.6, 833.6, 833.6, 829.17, 829.17, 829.17, 829.17, 829.17, 826.01, 826.01, 826.01, 826.01, 826.01, 826.41, 826.41, 826.41, 826.41, 826.41, 825.1, 825.1, 825.1, 825.1, 825.1, 830.84, 830.84, 830.84, 830.84, 830.84, 830.68, 830.68, 830.68, 830.68, 830.68, 831.17, 831.17, 831.17, 831.17, 831.17, 830.41, 830.41, 830.41, 830.41, 830.41, 837.67, 837.67, 837.67, 837.67, 837.67, 844.09, 844.09, 844.09, 844.09, 844.09, 829.71, 829.71, 829.71, 829.71, 829.71, 828.34, 828.34, 828.34, 828.34, 828.34, 825.44, 825.44, 825.44, 825.44, 825.44, 828.53, 828.53, 828.53, 828.53, 828.53, 831.82, 831.82, 831.82, 831.82, 831.82, 842.69, 842.69, 842.69, 842.69, 842.69, 849.9, 849.9, 849.9, 849.9, 849.9, 850.11, 850.11, 850.11, 850.11, 850.11, 848.99, 848.99, 848.99, 848.99, 848.99, 847.99, 847.99, 847.99, 847.99, 847.99, 848.54, 848.54, 848.54, 848.54, 848.54, 854.38, 854.38, 854.38, 854.38, 854.38, 854.0, 854.0, 854.0, 854.0, 854.0, 859.98, 859.98, 859.98, 859.98, 859.98, 858.29, 858.29, 858.29, 858.29, 858.29, 862.63, 862.63, 862.63, 862.63, 862.63, 865.27, 865.27, 865.27, 865.27, 865.27, 864.52, 864.52, 864.52, 864.52, 864.52, 870.62, 870.62, 870.62, 870.62, 870.62, 869.74, 869.74, 869.74, 869.74, 869.74, 870.07, 870.07, 870.07, 870.07, 870.07, 870.66, 870.66, 870.66, 870.66, 870.66, 870.74, 870.74, 870.74, 870.74, 870.74, 872.29, 872.29, 872.29, 872.29, 872.29, 875.08, 875.08, 875.08, 875.08]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.13, 30.13, 30.13, 30.13, 30.13, 27.02, 27.02, 27.02, 27.02, 27.02, 27.75, 27.75, 27.75, 27.75, 27.75, 28.89, 28.89, 28.89, 28.89, 28.89, 30.25, 30.25, 30.25, 30.25, 30.25, 32.6, 32.6, 32.6, 32.6, 32.6, 33.76, 33.76, 33.76, 33.76, 33.76, 33.96, 33.96, 33.96, 33.96, 33.96, 34.06, 34.06, 34.06, 34.06, 34.06, 33.46, 33.46, 33.46, 33.46, 33.46, 33.69, 33.69, 33.69, 33.69, 33.69, 33.24, 33.24, 33.24, 33.24, 33.24, 32.91, 32.91, 32.91, 32.91, 32.91, 32.47, 32.47, 32.47, 32.47, 32.47, 32.17, 32.17, 32.17, 32.17, 32.17, 32.46, 32.46, 32.46, 32.46, 32.46, 32.48, 32.48, 32.48, 32.48, 32.48, 32.05, 32.05, 32.05, 32.05, 32.05, 31.74, 31.74, 31.74, 31.74, 31.74, 31.47, 31.47, 31.47, 31.47, 31.47, 31.54, 31.54, 31.54, 31.54, 31.54, 31.63, 31.63, 31.63, 31.63, 31.63, 31.33, 31.33, 31.33, 31.33, 31.33, 31.51, 31.51, 31.51, 31.51, 31.51, 31.77, 31.77, 31.77, 31.77, 31.77, 31.88, 31.88, 31.88, 31.88, 31.88, 31.64, 31.64, 31.64, 31.64, 31.64, 31.17, 31.17, 31.17, 31.17, 31.17, 31.26, 31.26, 31.26, 31.26, 31.26, 31.43, 31.43, 31.43, 31.43, 31.43, 31.63, 31.63, 31.63, 31.63, 31.63, 31.87, 31.87, 31.87, 31.87, 31.87, 31.9, 31.9, 31.9, 31.9, 31.9, 31.79, 31.79, 31.79, 31.79, 31.79, 31.62, 31.62, 31.62, 31.62, 31.62, 31.6, 31.6, 31.6, 31.6, 31.6, 31.58, 31.58, 31.58, 31.58, 31.58, 31.49, 31.49, 31.49, 31.49, 31.49, 31.65, 31.65, 31.65, 31.65, 31.65, 31.8, 31.8, 31.8, 31.8, 31.8, 31.9, 31.9, 31.9, 31.9, 31.9, 31.69, 31.69, 31.69, 31.69, 31.69, 31.3, 31.3, 31.3, 31.3, 31.3, 30.92, 30.92, 30.92, 30.92, 30.92, 30.43, 30.43, 30.43, 30.43, 30.43, 29.76, 29.76, 29.76, 29.76, 29.76, 29.78, 29.78, 29.78, 29.78, 29.78, 29.83, 29.83, 29.83, 29.83, 29.83, 30.0, 30.0, 30.0, 30.0, 30.0, 30.08, 30.08, 30.08, 30.08, 30.08, 30.16, 30.16, 30.16, 30.16, 30.16, 30.17, 30.17, 30.17, 30.17, 30.17, 29.98, 29.98, 29.98, 29.98, 29.98, 29.93, 29.93, 29.93, 29.93, 29.93, 29.9, 29.9, 29.9, 29.9, 29.9, 30.02, 30.02, 30.02, 30.02, 30.02, 30.18, 30.18, 30.18, 30.18, 30.18, 30.25, 30.25, 30.25, 30.25, 30.25, 30.34, 30.34, 30.34, 30.34, 30.34, 30.38, 30.38, 30.38, 30.38]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.07, 0.07, 0.07, 0.07, 0.07, 0.35, 0.35, 0.35, 0.35, 0.35, 0.31, 0.31, 0.31, 0.31, 0.31, 0.18, 0.18, 0.18, 0.18, 0.18, 0.2, 0.2, 0.2, 0.2, 0.2, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.3, 0.3, 0.3, 0.3, 0.3, 0.23, 0.23, 0.23, 0.23, 0.23, 0.33, 0.33, 0.33, 0.33, 0.33, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.31, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.28, 0.28, 0.28, 0.28, 0.28, 0.22, 0.22, 0.22, 0.22, 0.22, 0.08, 0.08, 0.08, 0.08, 0.08, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.35, 0.35, 0.35, 0.35, 0.35, 0.47, 0.47, 0.47, 0.47, 0.47, 0.57, 0.57, 0.57, 0.57, 0.57, 0.52, 0.52, 0.52, 0.52, 0.52, 0.5, 0.5, 0.5, 0.5, 0.5, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.2, 0.2, 0.2, 0.2, 0.2, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.08, 0.08, 0.08, 0.08, 0.08, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716792903 --> 1716793529
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0]

mofosyne · 2024-05-22T09:03:00Z

There was a lot of changes which caused conflict. Unable to see how to easily resolve it. @airMeng can you see if there is much that needs to be fixed?

airMeng · 2024-05-22T09:06:42Z

There was a lot of changes which caused conflict. Unable to see how to easily resolve it. @airMeng can you see if there is much that needs to be fixed?

never mind. just more time needed.

airMeng · 2024-06-18T06:15:45Z

@NeoZhangJianyu @luoyu-intel @AidanBeltonS refactored after #7710, please have a review.
I try to limit the scope of this PR as small as possible to make it easy for reviewers

luoyu-intel

Why still keep some kernels in the ggml-sycl.cpp file?

NeoZhangJianyu · 2024-05-06T08:51:12Z

ggml-sycl.h


-#define GGML_SYCL_MAX_DEVICES       48
-#define GGML_SYCL_NAME "SYCL"
-


These two macro would be used by external caller.
So, don't move to presets.hpp.

Suggest all exports function/variable/macro are defined in ggml-sycl.h

abhilash1910

LGTM! ping @joeatodd @ggerganov for a look when available.

airMeng requested review from NeoZhangJianyu, abhilash1910 and slaren March 31, 2024 10:54

airMeng changed the title ~~[SYCL refactor~~ [SYCL] refactor Apr 1, 2024

phymbert mentioned this pull request Apr 1, 2024

server: bench: continuous performance testing #6233

Closed

16 tasks

airMeng force-pushed the sycl-refactor branch from a2e77e6 to de88518 Compare April 25, 2024 14:03

airMeng marked this pull request as draft April 25, 2024 14:04

airMeng force-pushed the sycl-refactor branch from 839cc90 to ee2f923 Compare April 30, 2024 09:23

airMeng marked this pull request as ready for review April 30, 2024 09:37

abhilash1910 reviewed May 5, 2024

View reviewed changes

ggml-sycl/common.hpp Show resolved Hide resolved

abhilash1910 reviewed May 5, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

abhilash1910 reviewed May 5, 2024

View reviewed changes

ggml-sycl.cpp Outdated Show resolved Hide resolved

airMeng force-pushed the sycl-refactor branch from dc57207 to adc3d54 Compare May 6, 2024 03:47

mofosyne added Review Complexity : High Generally require indepth knowledge of LLMs or GPUs enhancement New feature or request labels May 10, 2024

Thellton mentioned this pull request May 13, 2024

Native Intel IPEX-LLM Support #7190

Closed

airMeng force-pushed the sycl-refactor branch from 4b561bd to 27c3f29 Compare May 22, 2024 03:55

github-actions bot added build Compilation issues ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 22, 2024

mofosyne marked this pull request as draft May 22, 2024 09:52

airMeng force-pushed the sycl-refactor branch from a458e6a to 50dffa1 Compare May 27, 2024 06:35

airMeng mentioned this pull request May 27, 2024

[SYCL] Align GEMM dispatch #7566

Merged

4 tasks

seperate lower precision GEMM from the main files

a7614fa

airMeng force-pushed the sycl-refactor branch from 50dffa1 to 167807d Compare June 18, 2024 06:14

airMeng marked this pull request as ready for review June 18, 2024 06:16

luoyu-intel approved these changes Jun 18, 2024

View reviewed changes

fix workgroup size hardcode

6a4fd2b

airMeng force-pushed the sycl-refactor branch from 167807d to 6a4fd2b Compare June 18, 2024 08:56

NeoZhangJianyu approved these changes Jun 18, 2024

View reviewed changes

abhilash1910 approved these changes Jun 18, 2024

View reviewed changes

airMeng merged commit 623494a into master Jun 19, 2024

airMeng deleted the sycl-refactor branch June 19, 2024 01:11


		#define GGML_SYCL_MAX_DEVICES 48
		#define GGML_SYCL_NAME "SYCL"

Conversation

airMeng commented Mar 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

airMeng commented Mar 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2024

Uh oh!

slaren commented Mar 31, 2024

Uh oh!

NeoZhangJianyu commented Apr 1, 2024

Uh oh!

abhilash1910 commented Apr 1, 2024

Uh oh!

airMeng commented Apr 1, 2024

Uh oh!

NeoZhangJianyu commented Apr 7, 2024

Uh oh!

airMeng commented May 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NeoZhangJianyu commented May 6, 2024

Uh oh!

NeoZhangJianyu commented May 6, 2024

Uh oh!

github-actions bot commented May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mofosyne commented May 22, 2024

Uh oh!

airMeng commented May 22, 2024

Uh oh!

airMeng commented Jun 18, 2024

Uh oh!

luoyu-intel left a comment

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu May 6, 2024

Choose a reason for hiding this comment

Uh oh!

abhilash1910 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

airMeng commented Mar 31, 2024 •

edited

Loading

airMeng commented Mar 31, 2024 •

edited

Loading

github-actions bot commented May 22, 2024 •

edited

Loading

abhilash1910 left a comment •

edited

Loading