[rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd#1016
Merged
dgaliffiAMD merged 35 commits intoROCm:developfrom Oct 7, 2025
Merged
[rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd#1016dgaliffiAMD merged 35 commits intoROCm:developfrom
dgaliffiAMD merged 35 commits intoROCm:developfrom
Conversation
757d9da to
d3a9b15
Compare
1712c1a to
9926233
Compare
projects/rocprofiler-systems/source/lib/core/trace_cache/sample_type.hpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/rocpd_post_processing.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk.cpp
Show resolved
Hide resolved
f8b1c31 to
9c13e25
Compare
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
f6e0a14 to
ace8715
Compare
mradosav-amd
approved these changes
Sep 29, 2025
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.hpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
a1eb46c to
4cca350
Compare
mradosav-amd
approved these changes
Sep 29, 2025
dgaliffiAMD
reviewed
Oct 3, 2025
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Outdated
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.cpp
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk.cpp
Show resolved
Hide resolved
projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk.cpp
Show resolved
Hide resolved
dgaliffiAMD
requested changes
Oct 3, 2025
Contributor
dgaliffiAMD
left a comment
There was a problem hiding this comment.
Thanks. Please see comments and suggestions.
dgaliffiAMD
reviewed
Oct 6, 2025
projects/rocprofiler-systems/source/lib/core/trace_cache/metadata_registry.hpp
Outdated
Show resolved
Hide resolved
dgaliffiAMD
approved these changes
Oct 6, 2025
Contributor
|
Reviewed "ubuntu 22.04 * gfx950" failure. It is not related and is being investigated in parallel. |
systems-assistant bot
pushed a commit
to ROCm/rocprofiler-systems
that referenced
this pull request
Oct 7, 2025
(#1016) * Add OMPT to ROCpd * Use correct category * Added wrapper functions for future control * Formatting * Fix naming * Comment change * Remove ompt_get_cb_args * Switched to using region_sample for OMPT * Remove relic function * Remove get_use_rocpd that was used in this pr (one still remains) * Rename ompt_get_args_string and reuse in tool_tracing_callback_stop * Make lock init and destroy cb instant * [Prototype] ROCPD Name fix * [Prototype] ROCPD Name fix P1 * [Prototype] ROCPD Name fix P2 * ROCPD Name fix * Var name changes * Rewrite cb overwrite to single function * [Important] Use parallel_data as key for parallel callback map * Fix workflow failure * Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0 * Add missing ROCPROFILER_VERSION check * Improve readability * Make ompt storage maps thread local * Part 1: Variable name fix, memory cleanup, and fixed asserts * Part 2: Add comments * Part 3: Add CI_THROW * Part 4: Formatting * Part 5: Move #include to cpp [rocm-systems] ROCm/rocm-systems#1016 (commit 0c53a12)
dgaliffiAMD
pushed a commit
that referenced
this pull request
Oct 17, 2025
* Add OMPT to ROCpd * Use correct category * Added wrapper functions for future control * Formatting * Fix naming * Comment change * Remove ompt_get_cb_args * Switched to using region_sample for OMPT * Remove relic function * Remove get_use_rocpd that was used in this pr (one still remains) * Rename ompt_get_args_string and reuse in tool_tracing_callback_stop * Make lock init and destroy cb instant * [Prototype] ROCPD Name fix * [Prototype] ROCPD Name fix P1 * [Prototype] ROCPD Name fix P2 * ROCPD Name fix * Var name changes * Rewrite cb overwrite to single function * [Important] Use parallel_data as key for parallel callback map * Fix workflow failure * Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0 * Add missing ROCPROFILER_VERSION check * Improve readability * Make ompt storage maps thread local * Part 1: Variable name fix, memory cleanup, and fixed asserts * Part 2: Add comments * Part 3: Add CI_THROW * Part 4: Formatting * Part 5: Move #include to cpp
1 task
shahamed
pushed a commit
that referenced
this pull request
Oct 22, 2025
* Add OMPT to ROCpd * Use correct category * Added wrapper functions for future control * Formatting * Fix naming * Comment change * Remove ompt_get_cb_args * Switched to using region_sample for OMPT * Remove relic function * Remove get_use_rocpd that was used in this pr (one still remains) * Rename ompt_get_args_string and reuse in tool_tracing_callback_stop * Make lock init and destroy cb instant * [Prototype] ROCPD Name fix * [Prototype] ROCPD Name fix P1 * [Prototype] ROCPD Name fix P2 * ROCPD Name fix * Var name changes * Rewrite cb overwrite to single function * [Important] Use parallel_data as key for parallel callback map * Fix workflow failure * Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0 * Add missing ROCPROFILER_VERSION check * Improve readability * Make ompt storage maps thread local * Part 1: Variable name fix, memory cleanup, and fixed asserts * Part 2: Add comments * Part 3: Add CI_THROW * Part 4: Formatting * Part 5: Move #include to cpp
dgaliffiAMD
pushed a commit
that referenced
this pull request
Nov 5, 2025
* Add OMPT to ROCpd * Use correct category * Added wrapper functions for future control * Formatting * Fix naming * Comment change * Remove ompt_get_cb_args * Switched to using region_sample for OMPT * Remove relic function * Remove get_use_rocpd that was used in this pr (one still remains) * Rename ompt_get_args_string and reuse in tool_tracing_callback_stop * Make lock init and destroy cb instant * [Prototype] ROCPD Name fix * [Prototype] ROCPD Name fix P1 * [Prototype] ROCPD Name fix P2 * ROCPD Name fix * Var name changes * Rewrite cb overwrite to single function * [Important] Use parallel_data as key for parallel callback map * Fix workflow failure * Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0 * Add missing ROCPROFILER_VERSION check * Improve readability * Make ompt storage maps thread local * Part 1: Variable name fix, memory cleanup, and fixed asserts * Part 2: Add comments * Part 3: Add CI_THROW * Part 4: Formatting * Part 5: Move #include to cpp
JeniferC99
pushed a commit
that referenced
this pull request
Nov 6, 2025
* [rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd (#1016) * Add OMPT to ROCpd * Use correct category * Added wrapper functions for future control * Formatting * Fix naming * Comment change * Remove ompt_get_cb_args * Switched to using region_sample for OMPT * Remove relic function * Remove get_use_rocpd that was used in this pr (one still remains) * Rename ompt_get_args_string and reuse in tool_tracing_callback_stop * Make lock init and destroy cb instant * [Prototype] ROCPD Name fix * [Prototype] ROCPD Name fix P1 * [Prototype] ROCPD Name fix P2 * ROCPD Name fix * Var name changes * Rewrite cb overwrite to single function * [Important] Use parallel_data as key for parallel callback map * Fix workflow failure * Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0 * Add missing ROCPROFILER_VERSION check * Improve readability * Make ompt storage maps thread local * Part 1: Variable name fix, memory cleanup, and fixed asserts * Part 2: Add comments * Part 3: Add CI_THROW * Part 4: Formatting * Part 5: Move #include to cpp * Add missing counter events handling for ROCPD (#1305) * Add missing counter events handling for ROCPD * Update projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp * Update projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp * Fixed formatting Signed-off-by: David Galiffi <David.Galiffi@amd.com> --------- Signed-off-by: Marjan Antic <Marjan.Antic@amd.com> Co-authored-by: David Galiffi <David.Galiffi@amd.com> * Update VERSION to 1.2.1 * Update CHANGELOG.md * Add caching of category region for rocpd (#1420) * Add caching of category region Fix vaapi traces Remove region_with_name * Applied suggestions from code review * Add some simple rocpd testing Signed-off-by: David Galiffi <David.Galiffi@amd.com> * Adjust `rocpd_string` validation parameters * Adjust `rocm_marker_api` validation parameters * Update projects/rocprofiler-systems/CHANGELOG.md Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update projects/rocprofiler-systems/CHANGELOG.md Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update projects/rocprofiler-systems/CHANGELOG.md Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> --------- Signed-off-by: Marjan Antic <Marjan.Antic@amd.com> Signed-off-by: David Galiffi <David.Galiffi@amd.com> Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com> Co-authored-by: marantic-amd <marantic@amd.com> Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com> Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Purpose: Enter OMPT callbacks to the database.
Technical Details
Main Changes
ompt_parallel_cb_storage: Exclusively used to handleparallel_beginandparallel_endcallbacks. Theirparallel_datavalue is used as their unique key.thread_beginandthread_endcallbacks, but we do not support these as of now due to finalization issue.ompt_cache_instant_event.ompt_push...function to insert it into the map. Once the end is received, theompt_pop...function is used to exrtact the data then cache it usingcache_ompt_region.ompt_finalize_orphan_events. This function is called intool_fini.Other changes
tool_tracing_callbackand updated its comment to be more precise.ompt_get_unified_nameas a function that will return the correct track name.Test Plan
Tested using my own
cpu.f90,gpu.f90andmutex.f90.Also tested by spawning nested parallel regions for
N = [10000, 100000]Test Result
Using the
rocpd2perfettotool, the callbacks appear as expected. The same holds for the visualizer.For the callback renaming, logs show that operations have been renamed and all categories (and their operations) that follow overwritten cat are present.
Category 15:
[0] omp_thread
[1] omp_thread
[2] omp_parallel
[3] omp_parallel
...
[29] omp_error
[30] omp_callback_functions
Category 16:
[0] MEMORY_ALLOCATION_NONE
[1] MEMORY_ALLOCATION_ALLOCATE
[2] MEMORY_ALLOCATION_VMEM_ALLOCATE
...
Result of nested parallel regions showed that the maps had to be made
thread_local.Submission Checklist