Skip to content

[rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd#1016

Merged
dgaliffiAMD merged 35 commits intoROCm:developfrom
kcossett-amd:users/kcossett-amd/rocpd-ompt
Oct 7, 2025
Merged

[rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd#1016
dgaliffiAMD merged 35 commits intoROCm:developfrom
kcossett-amd:users/kcossett-amd/rocpd-ompt

Conversation

@kcossett-amd
Copy link
Contributor

@kcossett-amd kcossett-amd commented Sep 16, 2025

Motivation

Purpose: Enter OMPT callbacks to the database.

Technical Details

Main Changes

  • Callbacks to be put inside ROCpd are inserted in three maps:
    • ompt_parallel_cb_storage: Exclusively used to handle parallel_begin and parallel_end callbacks. Their parallel_data value is used as their unique key.
    • Theoretically, there should also be one for thread_begin and thread_end callbacks, but we do not support these as of now due to finalization issue.
    • Instant events are handled using ompt_cache_instant_event.
  • Once a "start" part of a callback is received, we use the respective ompt_push... function to insert it into the map. Once the end is received, the ompt_pop... function is used to exrtact the data then cache it using cache_ompt_region.
  • Any entries left in the map (as their "end" is not received) will be treated as orphan events and is handled by ompt_finalize_orphan_events. This function is called in tool_fini.
  • An "orphan event" is treated the same as an "instant" event.

Other changes

  • Moved the "is_first_implicit_task" check into tool_tracing_callback and updated its comment to be more precise.
  • Added ompt_get_unified_name as a function that will return the correct track name.
  • Added name checking in post processing.

Test Plan

Tested using my own cpu.f90, gpu.f90 and mutex.f90.

Also tested by spawning nested parallel regions for N = [10000, 100000]

Test Result

Using the rocpd2perfetto tool, the callbacks appear as expected. The same holds for the visualizer.

For the callback renaming, logs show that operations have been renamed and all categories (and their operations) that follow overwritten cat are present.

Category 15:
[0] omp_thread
[1] omp_thread
[2] omp_parallel
[3] omp_parallel
...
[29] omp_error
[30] omp_callback_functions

Category 16:
[0] MEMORY_ALLOCATION_NONE
[1] MEMORY_ALLOCATION_ALLOCATE
[2] MEMORY_ALLOCATION_VMEM_ALLOCATE
...

Result of nested parallel regions showed that the maps had to be made thread_local.

Submission Checklist

@kcossett-amd kcossett-amd changed the title [rocprofiler-systems] Add OMPT callbacks to ROCpd [rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd Sep 16, 2025
@kcossett-amd kcossett-amd force-pushed the users/kcossett-amd/rocpd-ompt branch from 757d9da to d3a9b15 Compare September 16, 2025 18:23
@kcossett-amd kcossett-amd force-pushed the users/kcossett-amd/rocpd-ompt branch from 1712c1a to 9926233 Compare September 17, 2025 12:58
@kcossett-amd kcossett-amd marked this pull request as ready for review September 17, 2025 13:21
@kcossett-amd kcossett-amd requested review from a team and jrmadsen as code owners September 17, 2025 13:21
@kcossett-amd kcossett-amd removed the WIP label Sep 17, 2025
@kcossett-amd kcossett-amd marked this pull request as draft September 17, 2025 14:56
@kcossett-amd kcossett-amd marked this pull request as ready for review September 17, 2025 15:13
@kcossett-amd kcossett-amd force-pushed the users/kcossett-amd/rocpd-ompt branch from f8b1c31 to 9c13e25 Compare September 21, 2025 22:49
@kcossett-amd kcossett-amd force-pushed the users/kcossett-amd/rocpd-ompt branch from f6e0a14 to ace8715 Compare September 22, 2025 12:44
@kcossett-amd kcossett-amd removed the WIP label Sep 25, 2025
@kcossett-amd kcossett-amd force-pushed the users/kcossett-amd/rocpd-ompt branch from a1eb46c to 4cca350 Compare September 29, 2025 13:35
Copy link
Contributor

@dgaliffiAMD dgaliffiAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please see comments and suggestions.

@dgaliffiAMD
Copy link
Contributor

Reviewed "ubuntu 22.04 * gfx950" failure. It is not related and is being investigated in parallel.

@dgaliffiAMD dgaliffiAMD merged commit 0c53a12 into ROCm:develop Oct 7, 2025
48 of 50 checks passed
systems-assistant bot pushed a commit to ROCm/rocprofiler-systems that referenced this pull request Oct 7, 2025
 (#1016)

* Add OMPT to ROCpd

* Use correct category

* Added wrapper functions for future control

* Formatting

* Fix naming

* Comment change

* Remove ompt_get_cb_args

* Switched to using region_sample for OMPT

* Remove relic function

* Remove get_use_rocpd that was used in this pr (one still remains)

* Rename ompt_get_args_string and reuse in tool_tracing_callback_stop

* Make lock init and destroy cb instant

* [Prototype] ROCPD Name fix

* [Prototype] ROCPD Name fix P1

* [Prototype] ROCPD Name fix P2

* ROCPD Name fix

* Var name changes

* Rewrite cb overwrite to single function

* [Important] Use parallel_data as key for parallel callback map

* Fix workflow failure

* Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0

* Add missing ROCPROFILER_VERSION check

* Improve readability

* Make ompt storage maps thread local

* Part 1: Variable name fix, memory cleanup, and fixed asserts

* Part 2: Add comments

* Part 3: Add CI_THROW

* Part 4: Formatting

* Part 5: Move #include to cpp
[rocm-systems] ROCm/rocm-systems#1016 (commit 0c53a12)
dgaliffiAMD pushed a commit that referenced this pull request Oct 17, 2025
* Add OMPT to ROCpd

* Use correct category

* Added wrapper functions for future control

* Formatting

* Fix naming

* Comment change

* Remove ompt_get_cb_args

* Switched to using region_sample for OMPT

* Remove relic function

* Remove get_use_rocpd that was used in this pr (one still remains)

* Rename ompt_get_args_string and reuse in tool_tracing_callback_stop

* Make lock init and destroy cb instant

* [Prototype] ROCPD Name fix

* [Prototype] ROCPD Name fix P1

* [Prototype] ROCPD Name fix P2

* ROCPD Name fix

* Var name changes

* Rewrite cb overwrite to single function

* [Important] Use parallel_data as key for parallel callback map

* Fix workflow failure

* Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0

* Add missing ROCPROFILER_VERSION check

* Improve readability

* Make ompt storage maps thread local

* Part 1: Variable name fix, memory cleanup, and fixed asserts

* Part 2: Add comments

* Part 3: Add CI_THROW

* Part 4: Formatting

* Part 5: Move #include to cpp
shahamed pushed a commit that referenced this pull request Oct 22, 2025
* Add OMPT to ROCpd

* Use correct category

* Added wrapper functions for future control

* Formatting

* Fix naming

* Comment change

* Remove ompt_get_cb_args

* Switched to using region_sample for OMPT

* Remove relic function

* Remove get_use_rocpd that was used in this pr (one still remains)

* Rename ompt_get_args_string and reuse in tool_tracing_callback_stop

* Make lock init and destroy cb instant

* [Prototype] ROCPD Name fix

* [Prototype] ROCPD Name fix P1

* [Prototype] ROCPD Name fix P2

* ROCPD Name fix

* Var name changes

* Rewrite cb overwrite to single function

* [Important] Use parallel_data as key for parallel callback map

* Fix workflow failure

* Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0

* Add missing ROCPROFILER_VERSION check

* Improve readability

* Make ompt storage maps thread local

* Part 1: Variable name fix, memory cleanup, and fixed asserts

* Part 2: Add comments

* Part 3: Add CI_THROW

* Part 4: Formatting

* Part 5: Move #include to cpp
dgaliffiAMD pushed a commit that referenced this pull request Nov 5, 2025
* Add OMPT to ROCpd

* Use correct category

* Added wrapper functions for future control

* Formatting

* Fix naming

* Comment change

* Remove ompt_get_cb_args

* Switched to using region_sample for OMPT

* Remove relic function

* Remove get_use_rocpd that was used in this pr (one still remains)

* Rename ompt_get_args_string and reuse in tool_tracing_callback_stop

* Make lock init and destroy cb instant

* [Prototype] ROCPD Name fix

* [Prototype] ROCPD Name fix P1

* [Prototype] ROCPD Name fix P2

* ROCPD Name fix

* Var name changes

* Rewrite cb overwrite to single function

* [Important] Use parallel_data as key for parallel callback map

* Fix workflow failure

* Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0

* Add missing ROCPROFILER_VERSION check

* Improve readability

* Make ompt storage maps thread local

* Part 1: Variable name fix, memory cleanup, and fixed asserts

* Part 2: Add comments

* Part 3: Add CI_THROW

* Part 4: Formatting

* Part 5: Move #include to cpp
JeniferC99 pushed a commit that referenced this pull request Nov 6, 2025
* [rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd (#1016)

* Add OMPT to ROCpd

* Use correct category

* Added wrapper functions for future control

* Formatting

* Fix naming

* Comment change

* Remove ompt_get_cb_args

* Switched to using region_sample for OMPT

* Remove relic function

* Remove get_use_rocpd that was used in this pr (one still remains)

* Rename ompt_get_args_string and reuse in tool_tracing_callback_stop

* Make lock init and destroy cb instant

* [Prototype] ROCPD Name fix

* [Prototype] ROCPD Name fix P1

* [Prototype] ROCPD Name fix P2

* ROCPD Name fix

* Var name changes

* Rewrite cb overwrite to single function

* [Important] Use parallel_data as key for parallel callback map

* Fix workflow failure

* Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0

* Add missing ROCPROFILER_VERSION check

* Improve readability

* Make ompt storage maps thread local

* Part 1: Variable name fix, memory cleanup, and fixed asserts

* Part 2: Add comments

* Part 3: Add CI_THROW

* Part 4: Formatting

* Part 5: Move #include to cpp

* Add missing counter events handling for ROCPD (#1305)

* Add missing counter events handling for ROCPD

* Update projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp

* Update projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp

* Fixed formatting

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

---------

Signed-off-by: Marjan Antic <Marjan.Antic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

* Update VERSION to 1.2.1

* Update CHANGELOG.md

* Add caching of category region for rocpd (#1420)

* Add caching of category region

Fix vaapi traces

Remove region_with_name

* Applied suggestions from code review

* Add some simple rocpd testing

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Adjust `rocpd_string` validation parameters

* Adjust `rocm_marker_api` validation parameters

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Signed-off-by: Marjan Antic <Marjan.Antic@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Co-authored-by: marantic-amd <marantic@amd.com>
Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants