Skip to content

@FIR-1507 - GGML Blob: Unify TMU Codebase Across POSIX and FPGA#96

Open
akapoor3518 wants to merge 1 commit intomasterfrom
sdk-2.9
Open

@FIR-1507 - GGML Blob: Unify TMU Codebase Across POSIX and FPGA#96
akapoor3518 wants to merge 1 commit intomasterfrom
sdk-2.9

Conversation

@akapoor3518
Copy link
Copy Markdown

Currently tsisim is broken hence cant test at tsisim. Let proceed with this .
We will validate once tsisim is working

tested at posix and comare the generated blob python file

POSIX LOG
[akapoor@wspd0 llama.cpp]$ ./build-posix/bin/llama-cli-original -m /proj/rel/sw/ggml/models/smolVLM-256M.gguf --device tSavorite -p "my cat's name is" -c 2048 -b 128 --n-predict 4 --temp 0.0 --top-k 50 --top-p 0.9 --repeat-penalty 1.5 --repeat-last-n 5 --no-warmup --no-conversation
Failed to generate tool call example: Value is not callable: null at row 1, column 72:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 42:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 42:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 13:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}
at row 1, column 1:
<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '' }}{% endif %}{% endfor %}<end_of_utterance>
^
{% endfor %}{% if add_generation_prompt %}{{ 'Assistant:' }}{% endif %}

my cat's name is not mentioned in the

llama_perf_sampler_print: sampling time = 15.65 ms / 9 runs ( 1.74 ms per token, 575.12 tokens per second)
llama_perf_context_print: load time = 372657.41 ms
llama_perf_context_print: prompt eval time = 370205.55 ms / 5 tokens (74041.11 ms per token, 0.01 tokens per second)
llama_perf_context_print: eval time = 973.80 ms / 3 runs ( 324.60 ms per token, 3.08 tokens per second)
llama_perf_context_print: total time = 373648.27 ms / 8 tokens

=== GGML Perf Summary ===
Op Target Runs TSI_KERNEL-RUN Total us Avg us
ADD OPU 420 652 539845 1285.35
MUL OPU 427 663 465522 1090.22
RMS_NORM OPU 427 427 369502 865.34
MUL_MAT CPU 7020 0 1201116 171.10
MUL_MAT OPU 117 11442 369371522 3157021.56
CONT CPU 1579 0 67573 42.79
RESHAPE CPU 2345 0 746 0.32
VIEW CPU 3622 0 452 0.12
PERMUTE CPU 2892 0 372 0.13
TRANSPOSE CPU 705 0 161 0.23
GET_ROWS CPU 53 0 3503 66.09
SET_ROWS CPU 1585 0 1263 0.80
SOFT_MAX CPU 805 0 28588 35.51
ROPE CPU 1650 0 6636 4.02
GLU OPU 210 326 339890 1618.52

OPU Profiling Results:
Calls Total(ms) T/call Self(ms) Function
12874 3.65e+05 28.3746 0.0000 [97.74%] [Thread] tsi::runtime::TsavRT::awaitCommandListCompletion
9780 3.34e+05 34.1148 3.34e+05 └─ [89.27%] [ txe_mul_mat_tile_f32_k128 ]
1662 28431.6348 17.1069 28431.6348 └─ [ 7.61%] [ txe_mul_mat_tile_f32_k64 ]
12874 10868.1452 0.8442 10868.1452 └─ [ 2.91%] TXE 0 Idle
236 78.7403 0.3336 78.7403 └─ [2.11e-02%] [ txe_swiglu ]
472 58.5994 0.1242 58.5994 └─ [1.57e-02%] [ txe_add ]
480 58.0449 0.1209 58.0449 └─ [1.55e-02%] [ txe_mult ]
244 45.5822 0.1868 45.5822 └─ [1.22e-02%] [ txe_rms_norm ]
[Thread] tsi::runtime::TsavRTPosix::initialize (cumulative over all threads)
1 36.4620 36.4620 32.7460 [9.76e-03%] [Thread] tsi::runtime::TsavRTPosix::initialize
1 3.5800 3.5800 3.0830 └─ [9.58e-04%] tsi::runtime::TsavRTPosix::initializeQueues
1 0.3640 0.3640 0.3640 └─ [9.74e-05%] tsi::runtime::TsavRT::awaitCommandListCompletion
1 0.1330 0.1330 0.0880 └─ [3.56e-05%] tsi::runtime::TsavRT::finalizeCommandList
1 0.0450 0.0450 0.0450 └─ [1.20e-05%] tsi::runtime::executeWithTimeout
1 0.1360 0.1360 0.1360 └─ [3.64e-05%] tsi::runtime::TsavRT::initialize
[Thread] tsi::runtime::TsavRTPosix::loadBlob (cumulative over all threads)
12874 1722.2390 0.1338 0.0000 [4.61e-01%] [Thread] tsi::runtime::TsavRTPosix::loadBlob
12874 10867.9667 0.8442 10867.9667 └─ [ 2.91%] TXE 0 Idle
25748 1294.5480 0.0503 1294.5480 └─ [3.46e-01%] tsi::runtime::executeWithTimeout
12864 236.9801 0.0184 236.9801 └─ [6.34e-02%] Command{command=2 (LOAD_BLOB), blob_args=[140065255486080[...
12874 8.6650 6.73e-04 8.6650 └─ [2.32e-03%] LOAD_BLOB Command Execution
6 6.5256 1.0876 6.5256 └─ [1.75e-03%] Command{command=2 (LOAD_BLOB), blob_args=[140065255483008[...
4 3.9426 0.9856 3.9426 └─ [1.05e-03%] Command{command=2 (LOAD_BLOB), blob_args=[140065255484544[...
[Thread] tsi::runtime::TsavRTPosix::unloadBlob (cumulative over all threads)
12874 2158.4290 0.1677 0.0000 [5.78e-01%] [Thread] tsi::runtime::TsavRTPosix::unloadBlob
12874 10868.3515 0.8442 10868.3515 └─ [ 2.91%] TXE 0 Idle
25748 1473.5760 0.0572 1473.5760 └─ [3.94e-01%] tsi::runtime::executeWithTimeout
12864 238.4388 0.0185 238.4388 └─ [6.38e-02%] Command{command=3 (UNLOAD_BLOB), blob_args=[14006525548608...
12874 9.5790 7.44e-04 9.5790 └─ [2.56e-03%] UNLOAD_BLOB Command Execution
4 0.0845 0.0211 0.0845 └─ [2.26e-05%] Command{command=3 (UNLOAD_BLOB), blob_args=[14006525548454...
6 0.0455 0.0076 0.0455 └─ [1.22e-05%] Command{command=3 (UNLOAD_BLOB), blob_args=[14006525548300...
[Thread] tsi::runtime::TsavRT::finalize (cumulative over all threads)
1 8.9640 8.9640 0.0000 [2.40e-03%] [Thread] tsi::runtime::TsavRT::finalize
1 10891.843010891.8430 10891.8430 └─ [ 2.91%] TXE 0 Idle
2 0.2630 0.1315 0.2630 └─ [7.04e-05%] tsi::runtime::executeWithTimeout
1 0.0146 0.0146 0.0146 └─ [3.90e-06%] Command{command=4 (RELEASE), blob_args=[0[0], 3[0x3], 4[0x...
2 0.0110 0.0055 0.0110 └─ [2.94e-06%] tsi::runtime::TsavRT::deallocate
1 0.0020 0.0020 0.0020 └─ [5.35e-07%] RELEASE Command Execution
[Thread] tsi::runtime::TsavRT::processResponses (cumulative over all threads)
12875 3.64e+05 28.3058 236.5760 [97.51%] [Thread] tsi::runtime::TsavRT::processResponses
12875 3.64e+05 28.2874 3.64e+05 └─ [97.45%] tsi::runtime::executeWithTimeout
[Thread] OPU (cumulative over all threads)
1 0.0850 0.0850 0.0570 [2.27e-05%] [Thread] OPU
1 0.0280 0.0280 0.0280 └─ [7.49e-06%] tsi::runtime::TsavRT::allocate
[Thread] txe_rms_norm (cumulative over all threads)
732 55.8470 0.0763 54.5820 [1.49e-02%] [Thread] txe_rms_norm
732 1.2650 0.0017 1.2650 └─ [3.38e-04%] tsi::runtime::executeWithTimeout
[Thread] txe_swiglu (cumulative over all threads)
708 89.2650 0.1261 88.0330 [2.39e-02%] [Thread] txe_swiglu
708 1.2320 0.0017 1.2320 └─ [3.30e-04%] tsi::runtime::executeWithTimeout
[Thread] tsi::runtime::TsavRT::finalizeCommandList (cumulative over all threads)
12874 248.7660 0.0193 227.3870 [6.66e-02%] [Thread] tsi::runtime::TsavRT::finalizeCommandList
12874 21.3790 0.0017 21.3790 └─ [5.72e-03%] tsi::runtime::executeWithTimeout
[Thread] txe_add (cumulative over all threads)
1416 76.5070 0.0540 74.2400 [2.05e-02%] [Thread] txe_add
1416 2.2670 0.0016 2.2670 └─ [6.07e-04%] tsi::runtime::executeWithTimeout
[Thread] txe_mult (cumulative over all threads)
1440 75.9970 0.0528 73.7720 [2.03e-02%] [Thread] txe_mult
1440 2.2250 0.0015 2.2250 └─ [5.95e-04%] tsi::runtime::executeWithTimeout
[Thread] txe_mul_mat_tile_f32_k128 (cumulative over all threads)
29340 3.34e+05 11.3937 3.34e+05 [89.44%] [Thread] txe_mul_mat_tile_f32_k128
29340 167.0290 0.0057 167.0290 └─ [4.47e-02%] tsi::runtime::executeWithTimeout
[Thread] txe_mul_mat_tile_f32_k64 (cumulative over all threads)
4986 28529.1790 5.7219 28509.3730 [ 7.63%] [Thread] txe_mul_mat_tile_f32_k64
4986 19.8060 0.0040 19.8060 └─ [5.30e-03%] tsi::runtime::executeWithTimeout
[Thread] TXE 0 Idle Time (cumulative over all threads)
[Thread] tsi::runtime::TsavRT::deallocate (cumulative over all threads)
12874 30.5550 0.0024 30.5550 [8.18e-03%] [Thread] tsi::runtime::TsavRT::deallocate
[Thread] tsi::runtime::TsavRT::addCommandToList (cumulative over all threads)
12874 68.2900 0.0053 68.2900 [1.83e-02%] [Thread] tsi::runtime::TsavRT::addCommandToList
[Thread] tsi::runtime::TsavRT::allocate (cumulative over all threads)
12889 43.0570 0.0033 43.0570 [1.15e-02%] [Thread] tsi::runtime::TsavRT::allocate
[Thread] tsi::runtime::executeWithTimeout (cumulative over all threads)
38626 9798.5210 0.2537 9798.5210 [ 2.62%] [Thread] tsi::runtime::executeWithTimeout

  • 3.74e+05 0.0000 3.74e+05 [100.00%] TOTAL
    ========================================================================================================================

Counter Metrics:
Metric Min Max Avg
Queue_0_Occupancy 0.0000 1.0000 0.9991
[akapoor@wspd0 llama.cpp]$
[akapoor@wspd0 llama.cpp]$ pwd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants