cuda: allow ggml_cuda_cpy to copy contiguous F32 and F16 tensors greater than INT_MAX
#16924
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a partial fix for #15049 and #14325, so the specific case where we are using
cudaMemcpyAsyncdoesn't fail.See #15298 and the discussion #15049 (comment) for details on fixing this for the non-
cudaMemcpyAsynccase.I'm unsure about
mudnnMemcpyAsyncand the only reference it I can find is here:https://github.com/SJTU-IPADS/PowerInfer/blob/d3ebd7c5666348cf43c22f0d62dfbc9a763cffb8/smallthinker/ggml/src/ggml-musa/mudnn.cu#L88
it doesn't look to need the
<= INT_MAXassertion, but not 100% sure without any proper API references.