Skip to content

Conversation

@linehill
Copy link
Collaborator

Enabling queue profiling by default slow down kernel enqueue API calls according to vtune, at least, on Intel OpenCL targeting Intel ARC A750. Disabling the profiling improved some HeCBench cases on the device:

  • overlay-hip: ~1.80x speed up.

  • floydwarshall-hip: ~1.47x speed up.

  • tqs-hip: ~1.13x speed up.

This patch creates queues with and without profiling and the non-profiling one is used at start. The BE switches to use the profiling queue when needed. Note, there is only transition from non-profiling queue to profiling one but not back.

Also, add environment variable for forcing queue profiling to be disabled.

Henry Linjamäki added 2 commits April 4, 2024 16:55
Enabling queue profiling by default slow down kernel enqueue API calls
according to vtune, at least, on Intel OpenCL targeting Intel ARC
A750. Disabling the profiling improved some HeCBench cases on the
device:

* overlay-hip: ~1.80x speed up.

* floydwarshall-hip: ~1.47x speed up.

* tqs-hip: ~1.13x speed up.

This patch creates queues with and without profiling and the
non-profiling one is used at start. The BE switches to use the
profiling queue when needed.

Note, there is only transition from non-profiling queue to profiling
one but not back.
@pvelesko pvelesko merged commit 4b7a300 into main Apr 24, 2024
@pvelesko pvelesko deleted the dyn-q-prof branch April 24, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants