-
Notifications
You must be signed in to change notification settings - Fork 40
OpenCL: Support devices with cl_ext_buffer_device_address #830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Switched to draft mode: need to reflect changes in pocl/pocl#1449. |
7f1cacd to
3a0a187
Compare
c2cc3af to
f02051b
Compare
|
conflicts |
pvelesko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conflicts
docs/Using.md
Outdated
| #### CHIP\_OCL\_USE\_ALLOC\_STRATEGY | ||
|
|
||
| Defines the allocation strategy the OpenCL backend uses for managing | ||
| HIP allocation. The valid case-insensitive choises and their meaning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
choices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now.
f02051b to
41f13fc
Compare
Use cl_ext_buffer_device_address extension (experimental at the time of this commit) for managing HIP device allocations when neither USM nor SVM is available. The latest main of POCL and hopefully soon Mesa's rusticl supports this extension. Briefly the extension enables the OpenCL backend to pin cl_mem buffers to device memory and obtain their fixed addresses. The extension has a down side that the addresses of the device and host allocations may alias, thus, we can't automatically determine copy direction of `hipMemcpy*(, hipMemcpyDefault)` calls. For this reason the unified [virtual] addressing feature is set off and, consequently, hipMemcpyDefault is unsupported and allocations from hipHostMalloc() are not implicitly mapped and portable. Also, `hipHostMalloc(..., hipHostMallocMapped)` calls are not supported yet (unimplemented). Other changes: * Use hipMemoryTypeDevice type for the shadow buffers of the global-scope `__device__` variables. * Remove redundant hipDeviceProp_t structure copy in chipStar::Device::getAttr(). * Define missing unifiedAddressing device property. * Define missing hipDeviceAttributeUnifiedAddressing attribute. * On devices with `unifiedAddressing == 1` hipHostMallocMapped and hipHostMallocPortable flags are set on when calling hipHostMalloc(). * Map/unmap only device accessible host allocations (ones with hipHostMallocMapped). * On devices with `unifiedAddressing == 0` hipHostMalloc() called with default flags allocates plain host memory instead of device (accessible) memory. * Add hipMemcpyKind parameter to chipstar::Queue::hipMemcpy*() methods which is needed by the OpenCL backend for calling the right driver copy API function under `unifiedAddressing == 0`. * Add CHIP_OCL_USE_ALLOC_STRATEGY environment variable for instructing OpenCL the backend to use either USM, SVM or the cl_ext_buffer_device_address. * Added a test for cl_ext_buffer_device_address and also check excepted API behaviors for `unifiedAddressing == 0`. The test is run if the device supports the extension.
... expressed as global-scope variables which rusticl currently doesn't support yet. Along the way, refactor SPIR-V processing: * Remove redundant member variables. * Remove redundant SPIR-V binary copies. * Move SPIR-V analysis from the backend to SPVRegister to unify the SPIR-V processing. * Rename filterSPIRV -> preprocessSPIRV. * CHIP_DUMP_SPIRV=1 now dumps the SPIR-V binary on failing SPIR-V processing stage.
41f13fc to
82a437e
Compare
A issue discovered while running HIP programs on OpenCL-BE->rusticl (with #830). Linking of __chip_atomic_add_f* symbols failed because the caller's and callee's function signature differed by their pointer parameters (pointee type didn't match). The mismatch was caused by LLVM-SPIRV-Translator's feature that attempts to recover original pointee types in LLVM bitcodes that use opaque pointers. But the way it attempts to infer the types may end up with SPIR-V functions with different pointee type across SPIR-V modules. The issue is worked around by passing pointers as integers for functions whose definitions are linked in at runtime.
pvelesko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question, other than that LGTM
| // UNIMPLEMENTED(nullptr); | ||
| if (Flags.isNumaUser()) | ||
| UNIMPLEMENTED(nullptr); | ||
| if (Flags.isPortable()) | ||
| UNIMPLEMENTED(nullptr); | ||
| if (Flags.isWriteCombined()) | ||
| logWarn("hipHostAllocWriteCombined is not supported. Ignoring."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why get rid if isPortable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this in hipHostMallocInternal():
if (ActiveDev->hasUnifiedVirtualAddressing()) {
// UVA implies hipHostMallocMapped and hipHostMallocPortable.
// [https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html]
Flags |= hipHostMallocMapped | hipHostMallocPortable;
}
And removed the mentioned code so the hipHostMalloc() calls don't error out.
pvelesko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question, other than that LGTM
Use cl_ext_buffer_device_address extension (experimental at the time
of this commit) for managing HIP device allocations on devices where neither USM
nor SVM is available. The latest main of POCL and, hopefully soon,
Mesa's rusticl supports this extension. Briefly, the extension enables
the OpenCL backend to pin cl_mem buffers to device memory and obtain their
fixed addresses.
The extension has a down side that the addresses of the device and
host allocations may alias, thus, we can't automatically determine
copy direction in
hipMemcpy*(, hipMemcpyDefault)calls. For thisreason the unified [virtual] addressing feature is set off and,
consequently, hipMemcpyDefault is unsupported and allocations from
hipHostMalloc() are not implicitly mapped and portable. Also,
hipHostMalloc(..., hipHostMallocMapped)calls are not supported yet(unimplemented).
Other changes:
Use hipMemoryTypeDevice type for the shadow buffers of the
global-scope
__device__variables.Remove redundant hipDeviceProp_t structure copy in
chipStar::Device::getAttr().
Define missing unifiedAddressing device property.
Define missing hipDeviceAttributeUnifiedAddressing attribute.
On devices with
unifiedAddressing == 1hipHostMallocMapped andhipHostMallocPortable flags are set on when calling hipHostMalloc().
Map/unmap only device accessible host allocations (ones with
hipHostMallocMapped).
On devices with
unifiedAddressing == 0hipHostMalloc() called withdefault flags allocates plain host memory instead of device
(accessible) memory.
Add hipMemcpyKind parameter to chipstar::Queue::hipMemcpy*() methods
which is needed by the OpenCL backend for calling the right driver
copy API function under
unifiedAddressing == 0.Add CHIP_OCL_USE_ALLOC_STRATEGY environment variable for instructing
OpenCL the backend to use either USM, SVM or the
cl_ext_buffer_device_address.
Added a test for cl_ext_buffer_device_address and also check
excepted API behaviors for
unifiedAddressing == 0. The test isrun if the device supports the extension.
Refactor SPIR-V processing. Filter out chipStar metadata expressed as global-scope variables.
CHIP_DUMP_SPIRV=1 dumps SPIR-V on failing SPIR-V processing step.