Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1335,51 +1335,63 @@ For the forms that take a USM pointer, the image memory must also have been
allocated within the same context and device of the `queue`. The USM memory
must be accessible on the queue's device.

The `ext_oneapi_copy` function variants that don't take offsets and extents may
fail in the following scenarios:
The `ext_oneapi_copy` function variants that do not take offsets and extents
must ensure that the following conditions are met.:

1. The `Src` and `Dest` memory was not allocated on the same device and
context of the queue.
1. The `Src` and `Dest` memory was allocated on the same device and context.

2. The `Src` and `Dest` memory regions, where `Src` or `Dest` can be either
on the host or device, do not have the same memory capacity, where the capacity
is calculate from the `width`, `height`, `depth`, `channel_order`, and
2. The `Src` and `Dest` memory regions, where `Src` or `Dest` can be either
on the host or device, have the same memory capacity, where the capacity
is calculated from the `width`, `height`, `depth`, `channel_order`, and
`channel_type` members of the `image_descriptor` parameter.

The `ext_oneapi_copy` function variants that do take offsets and extents may
fail in the following scenarios:
The `ext_oneapi_copy` function variants that do take offsets and extents must
ensure that the following conditions are met. If a condition names a specific
parameter, it is only applicable to the function variants that take that
parameter.

1. The `Src` and `Dest` memory was not allocated on the same device and
context of the queue.
1. The `Src` and `Dest` memory was allocated on the same device and context.

2. The image descriptor passed does not match the image descriptor used to
allocate the image on the device.
2. The image descriptors passed match the image descriptors used to allocate
the image's memory on the device.

3. the `CopyExtent` describes a memory region larger than that which was
allocated on either the host or the device.
3. The `CopyExtent` describes a memory region that is not larger than that which
was allocated on either the host or the device.

4. The `HostExtent` describes a memory region larger than that which was
allocated on the host.
4. The `HostExtent` describes a memory region that is not larger than that which
was allocated on the host.

5. The `SrcExtent` describes a memory region larger than that which was
allocated, where `Src` can be either the host or device.
5. The `SrcExtent` describes a memory region that is not larger than that which
was allocated, where `Src` can be either on the host or on the device.

6. The `DestExtent` describes a memory region larger than that which was
allocated, where `Dest` can be either the host or device.
6. The `DestExtent` describes a memory region that is not larger than that which
was allocated, where `Dest` can be either on the host or on the device.

7. If `SrcOffset + CopyExtent` moves the memory sub-region outside the bounds
of the memory described by `Src`, irrespective of whether `Src` is on the host
or the device.
7. The `DeviceRowPitch` adheres to the alignment requirements outlined in the
"Pitch alignment restrictions and queries" section.

8. If `DestOffset + CopyExtent` moves the memory sub-region outside the bounds
of the memory described by `Dest`, irrespective of whether `Dest` is on the
host or the device.
8. The `DeviceRowPitch` is greater than or equal to the width of the image on
the device.

9. The `DeviceRowPitch` does not adhere to the alignment requirements
outlined in section "Pitch alignment restrictions and queries"
9. For the relevant dimensions, `SrcOffset + CopyExtent` does not move the
memory sub-region outside the bounds of the memory described by `Src`,
irrespective of whether `Src` is on the host or the device. The relevant
dimensions are `x` for 1D images; `x` and `y` for 2D images; and `x`, `y`, and
`z` for 3D images. `x`, `y`, and `z` correspond to indices `0`, `1`, and `2` of
the `SrcOffset` and `CopyExtent` parameters, respectively.

10. For the relevant dimensions, `DestOffset + CopyExtent` does not move the
memory sub-region outside the bounds of the memory described by `Dest`,
irrespective of whether `Dest` is on the host or the device. The relevant
dimensions are `x` for 1D images, `x` and `y` for 2D images, and `x`, `y`, and
`z` for 3D images. `x`, `y`, and `z` correspond to indices `0`, `1`, and `2` of
the `SrcOffset` and `CopyExtent` parameters, respectively.

11. The `CopyExtent`'s' `x`, `y`, and `z` dimensions must not be `0`. They must
be greater than or equal to `1`. Even if the image is 1D or 2D, the remaining
non-relevant dimension's values must be set to `1` in the `CopyExtent`
parameter.

10. The value of `DeviceRowPitch` is smaller than the width of the image on
the device.

If copying of an image fails, `ext_oneapi_copy` will throw a `sycl::exception`
with error code `sycl::errc::invalid`, and relay an error message back to the
Expand Down
80 changes: 59 additions & 21 deletions sycl/source/handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,33 @@ void *getValueFromDynamicParameter(

// Bindless image helpers

constexpr size_t get_channel_size(
const sycl::ext::oneapi::experimental::image_descriptor &Desc) {
switch (Desc.channel_type) {
case sycl::image_channel_type::fp16:
return sizeof(sycl::half);
case sycl::image_channel_type::fp32:
return sizeof(float);
case sycl::image_channel_type::snorm_int8:
case sycl::image_channel_type::unorm_int8:
case sycl::image_channel_type::signed_int8:
case sycl::image_channel_type::unsigned_int8:
return sizeof(uint8_t);
case sycl::image_channel_type::snorm_int16:
case sycl::image_channel_type::unorm_int16:
case sycl::image_channel_type::signed_int16:
case sycl::image_channel_type::unsigned_int16:
return sizeof(uint16_t);
case sycl::image_channel_type::signed_int32:
case sycl::image_channel_type::unsigned_int32:
return sizeof(uint32_t);
default:
throw sycl::exception(make_error_code(errc::invalid),
"Unsupported channel type");
return 0;
}
}

// Fill image type and return depth or array_size
static unsigned int
fill_image_type(const ext::oneapi::experimental::image_descriptor &Desc,
Expand Down Expand Up @@ -255,16 +282,8 @@ fill_copy_args(detail::handler_impl *impl,
impl->MDstImageDesc.depth = DestExtent[2];
}

if (impl->MImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_HOST_TO_DEVICE) {
impl->MSrcImageDesc.rowPitch = 0;
impl->MDstImageDesc.rowPitch = DestPitch;
} else if (impl->MImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_DEVICE_TO_HOST) {
impl->MSrcImageDesc.rowPitch = SrcPitch;
impl->MDstImageDesc.rowPitch = 0;
} else {
impl->MSrcImageDesc.rowPitch = SrcPitch;
impl->MDstImageDesc.rowPitch = DestPitch;
}
impl->MSrcImageDesc.rowPitch = SrcPitch;
impl->MDstImageDesc.rowPitch = DestPitch;
}

static void
Expand All @@ -277,9 +296,11 @@ fill_copy_args(detail::handler_impl *impl,
sycl::range<3> DestExtent = {0, 0, 0},
sycl::range<3> CopyExtent = {0, 0, 0}) {

fill_copy_args(impl, Desc, Desc, ImageCopyFlags, 0 /*SrcPitch*/,
0 /*DestPitch*/, SrcOffset, SrcExtent, DestOffset, DestExtent,
CopyExtent);
size_t SrcPitch = SrcExtent[0] * Desc.num_channels * get_channel_size(Desc);
size_t DestPitch = DestExtent[0] * Desc.num_channels * get_channel_size(Desc);

fill_copy_args(impl, Desc, Desc, ImageCopyFlags, SrcPitch, DestPitch,
SrcOffset, SrcExtent, DestOffset, DestExtent, CopyExtent);
}

static void
Expand Down Expand Up @@ -307,8 +328,13 @@ fill_copy_args(detail::handler_impl *impl,
sycl::range<3> DestExtent = {0, 0, 0},
sycl::range<3> CopyExtent = {0, 0, 0}) {

fill_copy_args(impl, SrcImgDesc, DestImgDesc, ImageCopyFlags, 0 /*SrcPitch*/,
0 /*DestPitch*/, SrcOffset, SrcExtent, DestOffset, DestExtent,
size_t SrcPitch =
SrcExtent[0] * SrcImgDesc.num_channels * get_channel_size(SrcImgDesc);
size_t DestPitch =
DestExtent[0] * DestImgDesc.num_channels * get_channel_size(DestImgDesc);

fill_copy_args(impl, SrcImgDesc, DestImgDesc, ImageCopyFlags, SrcPitch,
DestPitch, SrcOffset, SrcExtent, DestOffset, DestExtent,
CopyExtent);
}

Expand Down Expand Up @@ -1595,10 +1621,17 @@ void handler::ext_oneapi_copy(
get_pointer_type(Dest,
createSyclObjFromImpl<context>(impl->get_context())));

if (ImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_HOST_TO_DEVICE ||
ImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_DEVICE_TO_HOST) {
detail::fill_copy_args(get_impl(), Desc, ImageCopyFlags, DeviceRowPitch,
// Calculate host pitch, where host memory is always assumed to be tightly
// packed.
size_t HostRowPitch =
Desc.width * Desc.num_channels * detail::get_channel_size(Desc);

if (ImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_HOST_TO_DEVICE) {
detail::fill_copy_args(get_impl(), Desc, ImageCopyFlags, HostRowPitch,
DeviceRowPitch);
} else if (ImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_DEVICE_TO_HOST) {
detail::fill_copy_args(get_impl(), Desc, ImageCopyFlags, DeviceRowPitch,
HostRowPitch);
} else {
throw sycl::exception(make_error_code(errc::invalid),
"Copy Error: This copy function only performs host "
Expand Down Expand Up @@ -1627,14 +1660,19 @@ void handler::ext_oneapi_copy(
get_pointer_type(Dest,
createSyclObjFromImpl<context>(impl->get_context())));

// Calculate host pitch, where host memory is always assumed to be tightly
// packed.
size_t HostRowPitch = HostExtent[0] * DeviceImgDesc.num_channels *
detail::get_channel_size(DeviceImgDesc);

// Fill the host extent based on the type of copy.
if (ImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_HOST_TO_DEVICE) {
detail::fill_copy_args(get_impl(), DeviceImgDesc, ImageCopyFlags,
DeviceRowPitch, DeviceRowPitch, SrcOffset,
HostExtent, DestOffset, {0, 0, 0}, CopyExtent);
HostRowPitch, DeviceRowPitch, SrcOffset, HostExtent,
DestOffset, {0, 0, 0}, CopyExtent);
} else if (ImageCopyFlags == UR_EXP_IMAGE_COPY_FLAG_DEVICE_TO_HOST) {
detail::fill_copy_args(get_impl(), DeviceImgDesc, ImageCopyFlags,
DeviceRowPitch, DeviceRowPitch, SrcOffset, {0, 0, 0},
DeviceRowPitch, HostRowPitch, SrcOffset, {0, 0, 0},
DestOffset, HostExtent, CopyExtent);
} else {
throw sycl::exception(make_error_code(errc::invalid),
Expand Down
12 changes: 6 additions & 6 deletions unified-runtime/source/adapters/cuda/image.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.dstY = pCopyRegion->dstOffset.y;
cpy_desc.WidthInBytes = PixelSizeBytes * pCopyRegion->copyExtent.width;
cpy_desc.Height = pCopyRegion->copyExtent.height;
cpy_desc.srcPitch = pSrcImageDesc->width * PixelSizeBytes;
cpy_desc.srcPitch = pSrcImageDesc->rowPitch;
if (pDstImageDesc->rowPitch == 0) {
cpy_desc.dstMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_ARRAY;
cpy_desc.dstArray = (CUarray)pDst;
Expand All @@ -725,7 +725,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.dstZ = pCopyRegion->dstOffset.z;
cpy_desc.srcMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_HOST;
cpy_desc.srcHost = pSrc;
cpy_desc.srcPitch = pSrcImageDesc->width * PixelSizeBytes;
cpy_desc.srcPitch = pSrcImageDesc->rowPitch;
cpy_desc.srcHeight = pSrcImageDesc->height;
cpy_desc.dstMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_ARRAY;
cpy_desc.dstArray = (CUarray)pDst;
Expand All @@ -745,7 +745,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.dstZ = pCopyRegion->dstOffset.z;
cpy_desc.srcMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_HOST;
cpy_desc.srcHost = pSrc;
cpy_desc.srcPitch = pSrcImageDesc->width * PixelSizeBytes;
cpy_desc.srcPitch = pSrcImageDesc->rowPitch;
cpy_desc.srcHeight = std::max(uint64_t{1}, pSrcImageDesc->height);
cpy_desc.dstMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_ARRAY;
cpy_desc.dstArray = (CUarray)pDst;
Expand Down Expand Up @@ -793,7 +793,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.dstY = pCopyRegion->dstOffset.y;
cpy_desc.WidthInBytes = PixelSizeBytes * pCopyRegion->copyExtent.width;
cpy_desc.Height = pCopyRegion->copyExtent.height;
cpy_desc.dstPitch = pDstImageDesc->width * PixelSizeBytes;
cpy_desc.dstPitch = pDstImageDesc->rowPitch;
cpy_desc.dstMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_HOST;
cpy_desc.dstHost = pDst;
if (pSrcImageDesc->rowPitch == 0) {
Expand All @@ -818,7 +818,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.srcArray = as_CUArray(pSrc);
cpy_desc.dstMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_HOST;
cpy_desc.dstHost = pDst;
cpy_desc.dstPitch = pDstImageDesc->width * PixelSizeBytes;
cpy_desc.dstPitch = pDstImageDesc->rowPitch;
cpy_desc.dstHeight = pDstImageDesc->height;
cpy_desc.WidthInBytes = PixelSizeBytes * pCopyRegion->copyExtent.width;
cpy_desc.Height = pCopyRegion->copyExtent.height;
Expand All @@ -838,7 +838,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.srcArray = as_CUArray(pSrc);
cpy_desc.dstMemoryType = CUmemorytype_enum::CU_MEMORYTYPE_HOST;
cpy_desc.dstHost = pDst;
cpy_desc.dstPitch = pDstImageDesc->width * PixelSizeBytes;
cpy_desc.dstPitch = pDstImageDesc->rowPitch;
cpy_desc.dstHeight = std::max(uint64_t{1}, pDstImageDesc->height);
cpy_desc.WidthInBytes = PixelSizeBytes * pCopyRegion->copyExtent.width;
cpy_desc.Height = std::max(uint64_t{1}, pCopyRegion->copyExtent.height);
Expand Down
12 changes: 6 additions & 6 deletions unified-runtime/source/adapters/hip/image.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.srcY = pCopyRegion->srcOffset.y;
cpy_desc.dstXInBytes = pCopyRegion->dstOffset.x * PixelSizeBytes;
cpy_desc.dstY = pCopyRegion->dstOffset.y;
cpy_desc.srcPitch = pSrcImageDesc->width * PixelSizeBytes;
cpy_desc.srcPitch = pSrcImageDesc->rowPitch;
if (pDstImageDesc->rowPitch == 0) {
cpy_desc.dstMemoryType = hipMemoryTypeArray;
cpy_desc.dstArray = static_cast<hipArray_t>(pDst);
Expand All @@ -727,7 +727,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.dstZ = pCopyRegion->dstOffset.z;
cpy_desc.srcMemoryType = hipMemoryTypeHost;
cpy_desc.srcHost = pSrc;
cpy_desc.srcPitch = pSrcImageDesc->width * PixelSizeBytes;
cpy_desc.srcPitch = pSrcImageDesc->rowPitch;
cpy_desc.srcHeight = pSrcImageDesc->height;
cpy_desc.dstMemoryType = hipMemoryTypeArray;
cpy_desc.dstArray = static_cast<hipArray_t>(pDst);
Expand All @@ -749,7 +749,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.dstZ = pCopyRegion->dstOffset.z;
cpy_desc.srcMemoryType = hipMemoryTypeHost;
cpy_desc.srcHost = pSrc;
cpy_desc.srcPitch = pSrcImageDesc->width * PixelSizeBytes;
cpy_desc.srcPitch = pSrcImageDesc->rowPitch;
cpy_desc.srcHeight = std::max(MinCopyHeight, pSrcImageDesc->height);
cpy_desc.dstMemoryType = hipMemoryTypeArray;
cpy_desc.dstArray = static_cast<hipArray_t>(pDst);
Expand Down Expand Up @@ -824,7 +824,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
}
cpy_desc.dstMemoryType = hipMemoryTypeHost;
cpy_desc.dstHost = pDst;
cpy_desc.dstPitch = pDstImageDesc->width * PixelSizeBytes;
cpy_desc.dstPitch = pDstImageDesc->rowPitch;
cpy_desc.WidthInBytes = PixelSizeBytes * pCopyRegion->copyExtent.width;
cpy_desc.Height = pCopyRegion->copyExtent.height;
UR_CHECK_ERROR(hipMemcpyParam2DAsync(&cpy_desc, Stream));
Expand All @@ -840,7 +840,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.srcArray = static_cast<hipArray_t>(const_cast<void *>(pSrc));
cpy_desc.dstMemoryType = hipMemoryTypeHost;
cpy_desc.dstHost = pDst;
cpy_desc.dstPitch = pDstImageDesc->width * PixelSizeBytes;
cpy_desc.dstPitch = pDstImageDesc->rowPitch;
cpy_desc.dstHeight = pDstImageDesc->height;
cpy_desc.WidthInBytes = PixelSizeBytes * pCopyRegion->copyExtent.width;
cpy_desc.Height = pCopyRegion->copyExtent.height;
Expand All @@ -863,7 +863,7 @@ UR_APIEXPORT ur_result_t UR_APICALL urBindlessImagesImageCopyExp(
cpy_desc.srcArray = static_cast<hipArray_t>(const_cast<void *>(pSrc));
cpy_desc.dstMemoryType = hipMemoryTypeHost;
cpy_desc.dstHost = pDst;
cpy_desc.dstPitch = pDstImageDesc->width * PixelSizeBytes;
cpy_desc.dstPitch = pDstImageDesc->rowPitch;
cpy_desc.dstHeight = std::max(MinCopyHeight, pDstImageDesc->height);
cpy_desc.WidthInBytes = PixelSizeBytes * pCopyRegion->copyExtent.width;
cpy_desc.Height =
Expand Down
6 changes: 2 additions & 4 deletions unified-runtime/source/adapters/level_zero/image_common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -779,8 +779,7 @@ ur_result_t bindlessImagesHandleCopyFlags(

switch (imageCopyFlags) {
case UR_EXP_IMAGE_COPY_FLAG_HOST_TO_DEVICE: {
uint32_t SrcRowPitch =
pSrcImageDesc->width * getPixelSizeBytes(pSrcImageFormat);
uint32_t SrcRowPitch = pSrcImageDesc->rowPitch;
uint32_t SrcSlicePitch = SrcRowPitch * pSrcImageDesc->height;
if (pDstImageDesc->rowPitch == 0) {
// Copy to Non-USM memory
Expand Down Expand Up @@ -824,8 +823,7 @@ ur_result_t bindlessImagesHandleCopyFlags(
return UR_RESULT_SUCCESS;
};
case UR_EXP_IMAGE_COPY_FLAG_DEVICE_TO_HOST: {
uint32_t DstRowPitch =
pDstImageDesc->width * getPixelSizeBytes(pDstImageFormat);
uint32_t DstRowPitch = pDstImageDesc->rowPitch;
uint32_t DstSlicePitch = DstRowPitch * pDstImageDesc->height;
if (pSrcImageDesc->rowPitch == 0) {
// Copy from Non-USM memory to host
Expand Down