Skip to content

Conversation

@linehill
Copy link
Collaborator

@linehill linehill commented Apr 15, 2024

Use cl_ext_buffer_device_address extension (experimental at the time
of this commit) for managing HIP device allocations on devices where neither USM
nor SVM is available. The latest main of POCL and, hopefully soon,
Mesa's rusticl supports this extension. Briefly, the extension enables
the OpenCL backend to pin cl_mem buffers to device memory and obtain their
fixed addresses.

The extension has a down side that the addresses of the device and
host allocations may alias, thus, we can't automatically determine
copy direction in hipMemcpy*(, hipMemcpyDefault) calls. For this
reason the unified [virtual] addressing feature is set off and,
consequently, hipMemcpyDefault is unsupported and allocations from
hipHostMalloc() are not implicitly mapped and portable. Also,
hipHostMalloc(..., hipHostMallocMapped) calls are not supported yet
(unimplemented).

Other changes:

  • Use hipMemoryTypeDevice type for the shadow buffers of the
    global-scope __device__ variables.

  • Remove redundant hipDeviceProp_t structure copy in
    chipStar::Device::getAttr().

  • Define missing unifiedAddressing device property.

  • Define missing hipDeviceAttributeUnifiedAddressing attribute.

  • On devices with unifiedAddressing == 1 hipHostMallocMapped and
    hipHostMallocPortable flags are set on when calling hipHostMalloc().

  • Map/unmap only device accessible host allocations (ones with
    hipHostMallocMapped).

  • On devices with unifiedAddressing == 0 hipHostMalloc() called with
    default flags allocates plain host memory instead of device
    (accessible) memory.

  • Add hipMemcpyKind parameter to chipstar::Queue::hipMemcpy*() methods
    which is needed by the OpenCL backend for calling the right driver
    copy API function under unifiedAddressing == 0.

  • Add CHIP_OCL_USE_ALLOC_STRATEGY environment variable for instructing
    OpenCL the backend to use either USM, SVM or the
    cl_ext_buffer_device_address.

  • Added a test for cl_ext_buffer_device_address and also check
    excepted API behaviors for unifiedAddressing == 0. The test is
    run if the device supports the extension.

  • Refactor SPIR-V processing. Filter out chipStar metadata expressed as global-scope variables.

  • CHIP_DUMP_SPIRV=1 dumps SPIR-V on failing SPIR-V processing step.

@linehill linehill marked this pull request as draft April 17, 2024 14:03
@linehill
Copy link
Collaborator Author

Switched to draft mode: need to reflect changes in pocl/pocl#1449.

@linehill linehill marked this pull request as ready for review April 23, 2024 11:02
@linehill linehill force-pushed the cl_ext_buffer_device_address branch 2 times, most recently from 7f1cacd to 3a0a187 Compare April 26, 2024 11:21
@linehill linehill force-pushed the cl_ext_buffer_device_address branch 4 times, most recently from c2cc3af to f02051b Compare May 2, 2024 08:43
@pvelesko
Copy link
Collaborator

conflicts

Copy link
Collaborator

@pvelesko pvelesko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conflicts

docs/Using.md Outdated
#### CHIP\_OCL\_USE\_ALLOC\_STRATEGY

Defines the allocation strategy the OpenCL backend uses for managing
HIP allocation. The valid case-insensitive choises and their meaning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now.

@linehill linehill force-pushed the cl_ext_buffer_device_address branch from f02051b to 41f13fc Compare May 13, 2024 10:41
Henry Linjamäki added 6 commits May 13, 2024 16:13
Use cl_ext_buffer_device_address extension (experimental at the time
of this commit) for managing HIP device allocations when neither USM
nor SVM is available. The latest main of POCL and hopefully soon
Mesa's rusticl supports this extension. Briefly the extension enables
the OpenCL backend to pin cl_mem buffers to device memory and obtain their
fixed addresses.

The extension has a down side that the addresses of the device and
host allocations may alias, thus, we can't automatically determine
copy direction of `hipMemcpy*(, hipMemcpyDefault)` calls. For this
reason the unified [virtual] addressing feature is set off and,
consequently, hipMemcpyDefault is unsupported and allocations from
hipHostMalloc() are not implicitly mapped and portable. Also,
`hipHostMalloc(..., hipHostMallocMapped)` calls are not supported yet
(unimplemented).

Other changes:

* Use hipMemoryTypeDevice type for the shadow buffers of the
  global-scope `__device__` variables.

* Remove redundant hipDeviceProp_t structure copy in
  chipStar::Device::getAttr().

* Define missing unifiedAddressing device property.

* Define missing hipDeviceAttributeUnifiedAddressing attribute.

* On devices with `unifiedAddressing == 1` hipHostMallocMapped and
  hipHostMallocPortable flags are set on when calling hipHostMalloc().

* Map/unmap only device accessible host allocations (ones with
  hipHostMallocMapped).

* On devices with `unifiedAddressing == 0` hipHostMalloc() called with
  default flags allocates plain host memory instead of device
  (accessible) memory.

* Add hipMemcpyKind parameter to chipstar::Queue::hipMemcpy*() methods
  which is needed by the OpenCL backend for calling the right driver
  copy API function under `unifiedAddressing == 0`.

* Add CHIP_OCL_USE_ALLOC_STRATEGY environment variable for instructing
  OpenCL the backend to use either USM, SVM or the
  cl_ext_buffer_device_address.

* Added a test for cl_ext_buffer_device_address and also check
  excepted API behaviors for `unifiedAddressing == 0`. The test is
  run if the device supports the extension.
... expressed as global-scope variables which rusticl currently
doesn't support yet. Along the way, refactor SPIR-V processing:

* Remove redundant member variables.

* Remove redundant SPIR-V binary copies.

* Move SPIR-V analysis from the backend to SPVRegister to unify the
  SPIR-V processing.

* Rename filterSPIRV -> preprocessSPIRV.

* CHIP_DUMP_SPIRV=1 now dumps the SPIR-V binary on failing SPIR-V
  processing stage.
@linehill linehill force-pushed the cl_ext_buffer_device_address branch from 41f13fc to 82a437e Compare May 13, 2024 13:13
linehill pushed a commit that referenced this pull request May 15, 2024
A issue discovered while running HIP programs on OpenCL-BE->rusticl
(with #830). Linking of __chip_atomic_add_f* symbols failed because
the caller's and callee's function signature differed by their pointer
parameters (pointee type didn't match).

The mismatch was caused by LLVM-SPIRV-Translator's feature that
attempts to recover original pointee types in LLVM bitcodes that use
opaque pointers. But the way it attempts to infer the types may end up
with SPIR-V functions with different pointee type across SPIR-V
modules.

The issue is worked around by passing pointers as integers for
functions whose definitions are linked in at runtime.
Copy link
Collaborator

@pvelesko pvelesko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question, other than that LGTM

Comment on lines 1080 to 1084
// UNIMPLEMENTED(nullptr);
if (Flags.isNumaUser())
UNIMPLEMENTED(nullptr);
if (Flags.isPortable())
UNIMPLEMENTED(nullptr);
if (Flags.isWriteCombined())
logWarn("hipHostAllocWriteCombined is not supported. Ignoring.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why get rid if isPortable()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this in hipHostMallocInternal():

if (ActiveDev->hasUnifiedVirtualAddressing()) {
    // UVA implies hipHostMallocMapped and hipHostMallocPortable.
    // [https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html]
    Flags |= hipHostMallocMapped | hipHostMallocPortable;
  }

And removed the mentioned code so the hipHostMalloc() calls don't error out.

Copy link
Collaborator

@pvelesko pvelesko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question, other than that LGTM

@pvelesko pvelesko merged commit a1c4fc3 into main May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants