Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion torchtune/training/_profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@


import os
import socket
import time
from functools import partial
from pathlib import Path
Expand Down Expand Up @@ -98,7 +99,9 @@ def trace_handler(
# Use tensorboard trace handler rather than directly exporting chrome traces since
# tensorboard doesn't seem to be able to parse traces with prof.export_chrome_trace
exporter = tensorboard_trace_handler(
curr_trace_dir, worker_name=f"rank{rank}", use_gzip=True
curr_trace_dir,
worker_name=f"rank{rank}_" + f"{socket.gethostname()}_{os.getpid()}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry noob question on this choice of worker_name: if I am launching a bunch of runs with profiling on the same host and not keeping track of the pid when I launch, does this actually solve the problem? Like why not instead allow the manual specification of an output filename or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebsmothers We can do a mamed argument probably. But I was speaking about solution which comes "out of the box". If we will do something like expirement_name: str = "", probably it wan't be usually defined if we don't actually require to define it. Let me update the PR and see if we can do better

use_gzip=True,
)
exporter(prof)

Expand Down