Skip to content

Change how cache manager handles child process trace cache for rocpd#1033

Merged
dgaliffiAMD merged 36 commits intodevelopfrom
users/mradosav-amd/add-vllm-v1-support
Oct 24, 2025
Merged

Change how cache manager handles child process trace cache for rocpd#1033
dgaliffiAMD merged 36 commits intodevelopfrom
users/mradosav-amd/add-vllm-v1-support

Conversation

@mradosav-amd
Copy link
Contributor

@mradosav-amd mradosav-amd commented Sep 17, 2025

Motivation

This PR has goal to add support to rocprofiler-systems to be able to handle AI workloads for vLLM v1.
vLLM v1 has different approach to process handling than vLLM v0, this PR will cover both approaches.

Resolves Tickets:

  • SWDEV-561488
  • SWDEV-549478

Technical Details

Child processes will create trace cache file and metadata file in tmp directory. Root process will gather all files under it, and create rocpd database.

Test Plan

Profile AI workload with vLLM v0 and v1 engine.

Test Result

rocprofiler-systems should generate rocpd databases for each process, both for v0 and v1 engine.

Submission Checklist

@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch from 691f804 to 23d87d8 Compare September 17, 2025 11:31
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch 2 times, most recently from be37764 to c438aaf Compare September 18, 2025 14:05
@github-actions github-actions bot added the github actions Pull requests that update GitHub Actions code label Sep 19, 2025
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch 6 times, most recently from 6f4b2bb to f1b6e47 Compare September 22, 2025 11:53
@mradosav-amd mradosav-amd marked this pull request as ready for review September 22, 2025 12:03
@mradosav-amd mradosav-amd requested review from a team and jrmadsen as code owners September 22, 2025 12:03
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch 2 times, most recently from 8b435a7 to 6e28312 Compare September 25, 2025 06:57
Copy link
Contributor

@dgaliffiAMD dgaliffiAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mradosav-amd, I'm still reviewing but here are some comments on my first pass.

@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch 2 times, most recently from a063b8e to 7ae9221 Compare October 1, 2025 07:01
@dgaliffiAMD dgaliffiAMD mentioned this pull request Oct 1, 2025
1 task
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch 4 times, most recently from a69b671 to b67b0ee Compare October 3, 2025 06:54
@mradosav-amd mradosav-amd removed the github actions Pull requests that update GitHub Actions code label Oct 3, 2025
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch 4 times, most recently from 52c0ac8 to 6e96292 Compare October 23, 2025 13:31
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch 3 times, most recently from 2a58ec7 to 2590eef Compare October 23, 2025 15:44
@mradosav-amd mradosav-amd force-pushed the users/mradosav-amd/add-vllm-v1-support branch from 2590eef to 1d046b3 Compare October 23, 2025 17:48
@dgaliffiAMD dgaliffiAMD force-pushed the users/mradosav-amd/add-vllm-v1-support branch from 9300c69 to c7110bc Compare October 24, 2025 03:59
@dgaliffiAMD dgaliffiAMD force-pushed the users/mradosav-amd/add-vllm-v1-support branch from 72c2865 to 74411a9 Compare October 24, 2025 13:28
@dgaliffiAMD dgaliffiAMD merged commit 8806be1 into develop Oct 24, 2025
50 of 54 checks passed
@dgaliffiAMD dgaliffiAMD deleted the users/mradosav-amd/add-vllm-v1-support branch October 24, 2025 15:47
systems-assistant bot pushed a commit to ROCm/rocprofiler-systems that referenced this pull request Oct 24, 2025
 for rocpd (#1033)

* Change how cache manager handles child process trace cache

* Sampling and backtrace metrics to cache

* Apply cmake formatting

* Fix parsing of metadata json

* Code clean up

* Fix build nlohmann json from source

* Fix storage parsed finished callback

* Revert sampling for child process

* Change cache file name generating

* Fix thread start stop

* Fix process start end timestamp

* Applied suggestions from code review

* Try with late start of flushing task thread

* Change dockerfiles for ci

* Revert changes on github workflows

* Remove json_fwd.hpp include

* fix dump

* Build nlohmann/json by default

Signed-off-by: David Galiffi <[email protected]>

* Update location of build artifacts for nlohmann/json

Signed-off-by: David Galiffi <[email protected]>

* Revert use_output_suffix

* Remove unused logs

* Fix cache store inside counter due to structure change

* Remove decode tests from debian ci

* Fix issue where all databases have the same UUID (#1499)

Co-authored-by: Aleksandar Djordjevic <[email protected]>

* Removing the cpack and install steps to save space

* Revert "Remove decode tests from debian ci"

This reverts commit ddabf6dd142dcf438e6b8997b8abe86f2c868468.

* Revert "Removing the cpack and install steps to save space"

This reverts commit 973da3a1ba99d99d529af5269d30e177092f9bfa.

* Add prepare-runner job as dependency to clean up the space

* Fix formatting

* Free up even more space

* Remove verbose for workflows

* remove hw_counters from ext_data

* move space clean up inside container

* try to remove external folder to free up space

* Check space

* Refactor Cleanup to it's own step
[rocm-systems] ROCm/rocm-systems#1033 (commit 8806be1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

github actions Pull requests that update GitHub Actions code organization: ROCm project: rocprofiler-systems

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants