OpenCL: Use non-profiling queue, switch to profiling when needed #814
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enabling queue profiling by default slow down kernel enqueue API calls according to vtune, at least, on Intel OpenCL targeting Intel ARC A750. Disabling the profiling improved some HeCBench cases on the device:
overlay-hip: ~1.80x speed up.
floydwarshall-hip: ~1.47x speed up.
tqs-hip: ~1.13x speed up.
This patch creates queues with and without profiling and the non-profiling one is used at start. The BE switches to use the profiling queue when needed. Note, there is only transition from non-profiling queue to profiling one but not back.
Also, add environment variable for forcing queue profiling to be disabled.