Skip to content

Import and refactor trace_link.py#47

Merged
srinivas212 merged 24 commits intomainfrom
import-trace-link
May 9, 2024
Merged

Import and refactor trace_link.py#47
srinivas212 merged 24 commits intomainfrom
import-trace-link

Conversation

@TaekyungHeo
Copy link
Copy Markdown
Contributor

@TaekyungHeo TaekyungHeo commented May 8, 2024

Summary

Import and refactor trace_link.py

Test Plan

1. Run trace_link

chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_0.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_0.json --output-file ~/megatron_0.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_1.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_1.json --output-file ~/megatron_1.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_2.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_2.json --output-file ~/megatron_2.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_3.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_3.json --output-file ~/megatron_3.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_4.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_4.json --output-file ~/megatron_4.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_5.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_5.json --output-file ~/megatron_5.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_6.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_6.json --output-file ~/megatron_6.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_7.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_7.json --output-file ~/megatron_7.json &

2. Run et_converter

chakra_converter --input_filename ~/megatron_0.json --output_filename megatron_0.chakra --input_type PyTorch > /tmp/rank_0 &
chakra_converter --input_filename ~/megatron_1.json --output_filename megatron_1.chakra --input_type PyTorch > /tmp/rank_1 &
chakra_converter --input_filename ~/megatron_2.json --output_filename megatron_2.chakra --input_type PyTorch > /tmp/rank_2 &
chakra_converter --input_filename ~/megatron_3.json --output_filename megatron_3.chakra --input_type PyTorch > /tmp/rank_3 &
chakra_converter --input_filename ~/megatron_4.json --output_filename megatron_4.chakra --input_type PyTorch > /tmp/rank_4 &
chakra_converter --input_filename ~/megatron_5.json --output_filename megatron_5.chakra --input_type PyTorch > /tmp/rank_5 &
chakra_converter --input_filename ~/megatron_6.json --output_filename megatron_6.chakra --input_type PyTorch > /tmp/rank_6 &
chakra_converter --input_filename ~/megatron_7.json --output_filename megatron_7.chakra --input_type PyTorch > /tmp/rank_7 &

3. Results
Screenshot 2024-05-08 at 7 42 27 PM

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner May 8, 2024 21:35
@github-actions
Copy link
Copy Markdown

github-actions bot commented May 8, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

TaekyungHeo added 18 commits May 8, 2024 18:51
The `handle_kineto_segmentation` function is intended to support kineto traces
cross multiple iterations by splitting a trace into several segments according
to the provided annotations. Unfortunately, this function is not operating as
expected, leading to errors. It is advisable to remove it.
The multi-iteration support feature for PyTorch execution traces is designed to
facilitate the handling of traces over multiple iterations. Unfortunately, this
feature is not functioning as expected and is leading to errors. It is advisable
to remove it.
This commit introduces support for inter-thread dependencies within the Chakra
framework. By examining Kineto traces via chrome://tracing, one can observe
multiple CPU threads and their implicit dependencies. This update explicitly
encodes these dependencies in the output trace, enabling accurate handling by
subsequent tools.
This commit adds stream ID encoding to GPU operators. This ensures that all
operators within the same stream are executed in the correct order, supporting
intra-stream dependencies.
Introduced exclusive duration calculation for Kineto operators in the TraceLinker
class.  This update differentiates between inclusive and exclusive durations,
providing a clearer distinction in the profiling data. Exclusive durations are
now calculated to identify the actual time spent in individual operations,
excluding overlaps with child operators.
@TaekyungHeo TaekyungHeo force-pushed the import-trace-link branch from 7bd7924 to f4026c6 Compare May 8, 2024 22:52
@TaekyungHeo TaekyungHeo changed the title Remove test_trace_link.py Import and refactor trace_link.py May 8, 2024
@TaekyungHeo TaekyungHeo force-pushed the import-trace-link branch 2 times, most recently from 5d33bf9 to 6b6dc29 Compare May 8, 2024 23:03
@TaekyungHeo TaekyungHeo force-pushed the import-trace-link branch from 6b6dc29 to 6a213c9 Compare May 8, 2024 23:04
@srinivas212 srinivas212 merged commit ee34486 into main May 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 9, 2024
@TaekyungHeo TaekyungHeo deleted the import-trace-link branch May 9, 2024 18:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants