Skip to content

Commit 4298c74

Browse files
committed
Add loci-analysis workflow from overlay
1 parent a0ed91a commit 4298c74

2 files changed

Lines changed: 110 additions & 0 deletions

File tree

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
name: LOCI Analysis
2+
on:
3+
push:
4+
branches:
5+
- loci/main-*
6+
pull_request:
7+
types: [opened, synchronize, reopened]
8+
9+
jobs:
10+
loci:
11+
if: vars.UPSTREAM_REPO != ''
12+
runs-on: ubuntu-latest
13+
14+
env:
15+
LOCI_PROJECT: 'Llama CPP'
16+
LOCI_API_KEY: '${{ secrets.LOCI_API_KEY }}'
17+
LOCI_BACKEND_URL: '${{ vars.LOCI_BACKEND_URL }}'
18+
GH_TOKEN: ${{ secrets.MIRROR_REPOS_WRITE_PAT }}
19+
20+
environment: ${{ vars.LOCI_ENV || 'PROD__AL_DEMO' }}
21+
22+
steps:
23+
- name: Checkout repository
24+
uses: actions/checkout@v4
25+
with:
26+
fetch-depth: 0
27+
ref: ${{ (github.event_name == 'pull_request' && github.event.pull_request.head.sha) || github.sha }}
28+
29+
- name: Compute target
30+
id: target
31+
if: github.event_name == 'push'
32+
run: |
33+
branch="${{ github.ref_name }}"
34+
sha="${branch#loci/main-}"
35+
echo "value=main@${sha}" >> "$GITHUB_OUTPUT"
36+
37+
- name: Compute base
38+
id: base
39+
if: github.event_name == 'pull_request'
40+
run: |
41+
git remote add upstream "https://github.com/${{ vars.UPSTREAM_REPO }}.git" 2>/dev/null || true
42+
git fetch upstream
43+
upstream_default=$(gh api "repos/${{ vars.UPSTREAM_REPO }}" --jq .default_branch)
44+
merge_base=$(git merge-base HEAD "upstream/${upstream_default}")
45+
short_sha="${merge_base:0:7}"
46+
echo "value=main@${short_sha}" >> "$GITHUB_OUTPUT"
47+
48+
- name: Install dependencies
49+
run: |
50+
sudo apt-get update
51+
sudo apt-get install -y \
52+
cmake \
53+
build-essential \
54+
gcc-aarch64-linux-gnu \
55+
g++-aarch64-linux-gnu \
56+
libcurl4-openssl-dev
57+
58+
- name: Create build directory and configure with CMake
59+
run: |
60+
mkdir build
61+
cd build
62+
cmake .. \
63+
-DCMAKE_SYSTEM_NAME=Linux \
64+
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
65+
-DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
66+
-DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
67+
-DCMAKE_OSX_SYSROOT= \
68+
-DCMAKE_OSX_DEPLOYMENT_TARGET= \
69+
-DBUILD_SHARED_LIBS=ON \
70+
-DLLAMA_BUILD_TESTS=OFF \
71+
-DLLAMA_BUILD_EXAMPLES=OFF \
72+
-DLLAMA_BUILD_SERVER=ON \
73+
-DLLAMA_BUILD_COMMON=ON \
74+
-DLLAMA_BUILD_TOOLS=ON \
75+
-DLLAMA_CURL=OFF \
76+
-DCMAKE_BUILD_TYPE=Debug \
77+
-DCMAKE_C_FLAGS="-march=armv8-a -Wl,-Bsymbolic" \
78+
-DCMAKE_CXX_FLAGS="-march=armv8-a -Wl,-Bsymbolic"
79+
80+
81+
- name: Build project
82+
run: |
83+
cd build
84+
cmake --build . -j4
85+
86+
- name: LOCI Upload
87+
uses: auroralabs-loci/loci-action@v1
88+
with:
89+
mode: upload
90+
binaries: |
91+
build/bin/libggml.so*
92+
build/bin/libllama.so*
93+
build/bin/libggml-cpu.so*
94+
build/bin/libggml-base.so*
95+
build/bin/libmtmd.so*
96+
build/bin/llama-bench
97+
build/bin/llama-cvector-generator
98+
build/bin/llama-gemma3-cli
99+
build/bin/llama-gguf-split
100+
build/bin/llama-llava-cli
101+
build/bin/llama-minicpmv-cli
102+
build/bin/llama-quantize
103+
build/bin/llama-qwen2vl-cli
104+
build/bin/llama-run
105+
build/bin/llama-tokenize
106+
build/bin/llama-tts
107+
project: '${{ env.LOCI_PROJECT }}'
108+
target: ${{ steps.target.outputs.value || ''}}
109+
base: ${{ steps.base.outputs.value || '' }}

pulls.ndjson

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"pull_number":"15307","title":"Add OpenVINO backend","body":"### Overview\r\n\r\nThis PR introduces an [OpenVINO backend](https://docs.openvino.ai/2025/index.html) for `llama.cpp`, enabling hardware-accelerated inference on **Intel® CPUs, GPUs, and NPUs**. The backend leverages OpenVINO to deliver optimized inference with the existing llama.cpp GGUF model ecosystem. Enables performance improvements via OpenVINO’s graph compilation and kernel fusion.\r\n\r\n* llama.cpp with OpenVINO backend: [Build Instructions](https://github.com/ravi9/llama.cpp/blob/dev_backend_openvino/docs/build.md#openvino)\r\n\r\n### Key Features:\r\n\r\n* **New backend implementation**\r\n * Added OpenVINO backend in `ggml/src/ggml-openvino`.\r\n * Implemented translations for core GGML operations\r\n\r\n* **Supported precisions**\r\n * FP16/BF16 GGUF models supported.\r\n * Q4_0, Q4_1, Q4_K_M, Q6_K models partially supported. (See notes below)\r\n\r\n* **Supported devices**\r\n * Intel CPUs\r\n * Intel integrated and discrete GPUs\r\n * Intel NPUs (requires **UD32+ driver**).\r\n\r\n**For NPU: currently prompt processing is slow, a smaller context size is recommended for better performance, e.g., `-c 512`.**\r\n\r\n**For llama-bench: `-fa 1` is required.**\r\n\r\n### Tested Models\r\n\r\nThe following models are validated for functionality.\r\n\r\nAccuracy and performance are WIP.\r\n\r\n* [`Llama-3.2-1B-Instruct-GGUF`](https://huggingface.co/MaziyarPanahi/Llama-3.2-1B-Instruct-GGUF)\r\n* [`Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) \r\n* [`microsoft/Phi-3-mini-4k-instruct-gguf`](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf)\r\n* [`Qwen/Qwen2.5-1.5B-Instruct-GGUF`](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF)\r\n* [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B)\r\n* [`openbmb/MiniCPM-1B-sft-bf16`](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)\r\n* [`tencent/Hunyuan-7B-Instruct`](https://huggingface.co/tencent/Hunyuan-7B-Instruct)\r\n* [`mistralai/Mistral-7B-Instruct-v0.3`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)\r\n\r\n### Work in Progress\r\n* Performance and memory optimizations \r\n* Broader quantization coverage.\r\n* Support for additional model architectures. \r\n* Extensive accuracy testing.\r\n\r\n### Notes on quantization support\r\n\r\n#### CPU\r\n* **Q4_0, Q4_1, Q4_K_M and Q6_K models are supported.**\r\n* Q6_K tensors (6bit gs16 sym) are converted to int8 gs16 sym.\r\n* Q5_K tensors (5bit gs32 asym) are converted to int8 gs32 asym.\r\n\r\n#### GPU\r\n* **Q4_0, Q4_1, Q4_K_M and Q6_K models are supported.**\r\n* Q6_K tensors (6bit gs16 sym) are requantized to int8 gs32 sym.\r\n* Q5_K tensors (5bit gs32 asym) are converted to int8 gs32 asym.\r\n\r\n#### NPU\r\n* **Main quantization scheme for the supported models in this PR is Q4_0.**\r\n* Q4_0 and Q4_1 tensors are requantized to int4 gs128 sym.\r\n* Q6_K tensors are dequantized to fp16.\r\n\r\nOther notes:\r\n* Both Q4_0 and Q4_1 models use Q6_K for the token_embedding tensor and the weight tensor in the last matmul (in most models it is the same tensor as token_emb).\r\n* Q4_0 models will produce some Q4_1 tensors if imatrix is provided as part of the quantization of the model using llama-quantize utility.\r\n* Q4_K_M models additionally have Q6_K tensors and Q5_K tensors (only in Phi3 in the validated model list of this PR).\r\n\r\nNOTE: Optimum-intel converts the fp16/bf16 token embedding tensor and the weight tensor in the last matmul to int8 asym channel-wise ([config code](https://github.com/huggingface/optimum-intel/blob/b60e4d4866509a1aeea2b7a3f26f2a70bc464354/optimum/commands/export/openvino.py#L183-L191)).\r\n\r\n\r\n","pull_head_sha":"db976265ce4da1c2bc3cf7bb45fc7ec4d1d02c29","loci_pr_branch":"loci/pr-15307-dev_backend_openvino","short_merge_base":"4d828bd","loci_main_branch":"loci/main-4d828bd","use_loci_base":0}

0 commit comments

Comments
 (0)