-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Feature request
Would be great to add a code example of how to do multi-GPU processing with 🤗 Datasets in the documentation. cc @stevhliu
Currently the docs has a small section on this saying "your big GPU call goes here", however it didn't work for me out-of-the-box.
Let's say you have a PyTorch model that can do translation, and you have multiple GPUs. In that case, you'd like to duplicate the model on each GPU, each processing (translating) a chunk of the data in parallel.
Here's how I tried to do that:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from multiprocess import set_start_method
import torch
import os
dataset = load_dataset("mlfoundations/datacomp_small")
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
# put model on each available GPU
# also, should I do it like this or use nn.DataParallel?
model.to("cuda:0")
model.to("cuda:1")
set_start_method("spawn")
def translate_captions(batch, rank):
os.environ["CUDA_VISIBLE_DEVICES"] = str(rank % torch.cuda.device_count())
texts = batch["text"]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to(model.device)
translated_tokens = model.generate(
**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"], max_length=30
)
translated_texts = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)
batch["translated_text"] = translated_texts
return batch
updated_dataset = dataset.map(translate_captions, with_rank=True, num_proc=2, batched=True, batch_size=256)
I've personally tried running this script on a machine with 2 A100 GPUs.
Error 1
Running the code snippet above from the terminal (python script.py) resulted in the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/spawn.py", line 125, in _main
prepare(preparation_data)
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/niels/python_projects/datacomp/datasets_multi_gpu.py", line 16, in <module>
set_start_method("spawn")
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/context.py", line 247, in set_start_method
raise RuntimeError('context has already been set')
RuntimeError: context has already been set
Error 2
Then, based on this Stackoverflow answer, I put the set_start_method("spawn") section in a try: catch block. This resulted in the following error:
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/datasets/dataset_dict.py", line 817, in <dictcomp>
k: dataset.map(
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2926, in map
with Pool(nb_of_missing_shards, initargs=initargs, initializer=initializer) as pool:
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/pool.py", line 215, in __init__
self._repopulate_pool()
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/pool.py", line 306, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/pool.py", line 329, in _repopulate_pool_static
w.start()
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/context.py", line 288, in _Popen
return Popen(process_obj)
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/home/niels/anaconda3/envs/datacomp/lib/python3.10/site-packages/multiprocess/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
So then I put the last line under a if __name__ == '__main__': block. Then the code snippet seemed to work, but it seemed that it's only leveraging a single GPU (based on monitoring nvidia-smi):
Mon Aug 28 12:19:24 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:01:00.0 Off | 0 |
| N/A 55C P0 76W / 275W | 8747MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:47:00.0 Off | 0 |
| N/A 67C P0 274W / 275W | 59835MiB / 81920MiB | 100% Default |
| | | Disabled |
Both GPUs should have equal GPU usage, but I've always noticed that the last GPU has way more usage than the other ones. This made me think that os.environ["CUDA_VISIBLE_DEVICES"] = str(rank % torch.cuda.device_count()) might not work inside a Python script, especially if done after importing PyTorch?
Motivation
Would be great to clarify how to do multi-GPU data processing.
Your contribution
If my code snippet can be fixed, I can contribute it to the docs :)