Skip to content

Conversation

@ddh0
Copy link
Owner

@ddh0 ddh0 commented Oct 14, 2025

No description provided.

anavp-nvidia and others added 9 commits October 14, 2025 11:53
* remove legacy copy-op pointer indirection code

* further removal of copy-op indirection code

* renamed check_node_graph_compatibility_and_refresh_copy_ops function
* CUDA: kernel for larger batch sizes for MoE

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* fixup

* tests

* Move mmq_ids_helper to mmid

* cleanup

* Remove redundant checks
* CUDA: use fastdiv + ggml_cuda_mad for mmvf

* use bf16 directly + fix formatting

* Add exception for HIP code
Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel.

Enable /MP so source files are compiled in parallel.
Signed-off-by: Stefan Savic <[email protected]>
Co-authored-by: Stefan Savic <[email protected]>
* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check
@ddh0 ddh0 merged commit e8831e0 into ddh0:glm45v Oct 14, 2025
101 of 135 checks passed
ddh0 pushed a commit that referenced this pull request Nov 5, 2025
* Add buffer label and enable dawn-specific toggles to turn off some checks

* Minor set_rows optimization (#4)

* updated optimization, fixed errors

* non vectorized version now dispatches one thread per element

* Simplify

* Change logic for set_rows pipelines

---------

Co-authored-by: Neha Abbas <[email protected]>
Co-authored-by: Neha Abbas <[email protected]>
Co-authored-by: Reese Levine <[email protected]>

* Comment on dawn toggles

* Remove some comments

* Implement overlap binary operators

* Revert "Implement overlap binary operators"

This reverts commit ed710b3.

* Disable support for non-contiguous binary_op tensors and leave note for future support

---------

Co-authored-by: neha-ha <[email protected]>
Co-authored-by: Neha Abbas <[email protected]>
Co-authored-by: Neha Abbas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants