Conversation
|
After the merge I've seen these tests failing in our internal CI: Do we know if we are missing something else or should I revert? Edit: I'm not 100% sure if this is the patch that broke things, our CI skipped #14617 |
This seems problem with MUL_MAT so I think this PR is unrelated. Might need to check 65a3ebb I'll test it in a moment. Although our ci seems passing |
|
Yeah sorry to disrupt you, I'm currently checking the other Commit as well, it's most likely the cause |
|
Confirmed to be 65a3ebb, again, sorry for the heads up. I got misled by the CI pointing at your PR. Seems to be specific to the Nvidia platform, so I think we can consider leaving it since Intel is the priority here. What do you think? Though probably the dev of the PR won't want anything merged that broke something. |
|
@Alcpz IMO, our priority should be Intel. If it doesn't effect Intel GPUs then it's up to you whether you want to fix it or not. We need to work on softmax kernel's broadcasting and batching support next and I prefer if you guys do it (I have a working implementation but somehow I had to remove all those templated block sizes in order to get it working, so I did not open a PR yet here). |
* SYCL: Use 1D kernel for set_rows * Remove dangling comment * Refactor and use ceil_div
Continue from #14562
Got it at the same level of performance as the cpy kernel
With LLAMA_SET_ROWS=0
Thanks @AD2605