-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Description
Environment info
transformersversion:- Platform:
- Python version: 3.7
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: sharedddp (Fairscale)
Who can help
Information
Model I am using (Bert, XLNet ...): Longformer
The problem arises when using:
- the official example scripts: (give details below)
- [ x ] my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- [ x ] my own task or dataset: (give details below)
To reproduce
When i use the same configuration to train model type bert it works but this does not work for longformer.
Steps to reproduce the behavior:
/opt/conda/bin/python -m torch.distributed.launch
--nnodes=$WORLD_SIZE
--node_rank=$RANK
--master_addr=$MASTER_ADDR
--master_port=$MASTER_PORT
--nproc_per_node=1 $SCRIPT
--output_dir=$OUT_DIR
--logging_dir=$OUT_DIR
--tokenizer_name=$TOKENIZER
--model_type=longformer --do_train --do_eval
--cache_dir=$CACHE_DIR
--overwrite_cache
--validation_file=$EVAL_DATA
--overwrite_output_dir
--train_file=$TRAIN_DATA_FOLDER
--dataset_name=$DATASET_NAME
--line_by_line
--learning_rate=${INIT_LR}
--save_steps=${SAVE_STEPS}
--max_seq_length=${BLOCK_SIZE}
--gradient_accumulation_steps=${GRAD_ACCUM_STEPS}
--fp16
--num_train_epochs=$EPOCHS
--per_device_train_batch_size=$BATCH_SIZE_PER_GPU
--local_rank=$LOCAL_RANK
--train_dataset_info_path=$TRAIN_DATASET_INFO
--test_dataset_info_path=$TEST_DATASET_INFO
--sharded_ddp \
Traceback (most recent call last):
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 661, in
main()
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 465, in main
train_result = trainer.train(resume_from_checkpoint=model_path)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1003, in train
tr_loss += self.training_step(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1477, in compute_loss
outputs = model(**inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/fairscale/nn/data_parallel/sharded_ddp.py", line 218, in forward
return self.module(*inputs, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1765, in forward
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1669, in forward
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1245, in forward
Traceback (most recent call last):
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 661, in
Traceback (most recent call last):
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 661, in
is_global_attn = is_index_global_attn.flatten().any().item()
RuntimeError: CUDA error: device-side assert triggered
main()
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 465, in main
main()
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 465, in main
train_result = trainer.train(resume_from_checkpoint=model_path)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1003, in train
train_result = trainer.train(resume_from_checkpoint=model_path)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1003, in train
tr_loss += self.training_step(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
tr_loss += self.training_step(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1477, in compute_loss
loss = self.compute_loss(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1477, in compute_loss
outputs = model(**inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
outputs = model(**inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/fairscale/nn/data_parallel/sharded_ddp.py", line 218, in forward
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/fairscale/nn/data_parallel/sharded_ddp.py", line 218, in forward
return self.module(*inputs, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
return self.module(*inputs, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1765, in forward
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1765, in forward
Traceback (most recent call last):
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 661, in
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1669, in forward
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1669, in forward
Traceback (most recent call last):
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 661, in
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
main()
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 465, in main
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1245, in forward
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1245, in forward
is_global_attn = is_index_global_attn.flatten().any().item()
RuntimeError: CUDA error: device-side assert triggered
is_global_attn = is_index_global_attn.flatten().any().item()
RuntimeError: CUDA error: device-side assert triggered
train_result = trainer.train(resume_from_checkpoint=model_path)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1003, in train
main()
File "/data/atc_tenant/bert_data/smancha5/run_mlm.py", line 465, in main
tr_loss += self.training_step(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1477, in compute_loss
train_result = trainer.train(resume_from_checkpoint=model_path)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1003, in train
outputs = model(**inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
tr_loss += self.training_step(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1443, in training_step
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/fairscale/nn/data_parallel/sharded_ddp.py", line 218, in forward
return self.module(*inputs, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1765, in forward
loss = self.compute_loss(model, inputs)
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1477, in compute_loss
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
outputs = model(**inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1669, in forward
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/fairscale/nn/data_parallel/sharded_ddp.py", line 218, in forward
return self.module(*inputs, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1765, in forward
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1245, in forward
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
is_global_attn = is_index_global_attn.flatten().any().item()
RuntimeError: CUDA error: device-side assert triggered
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1669, in forward
return_dict=return_dict,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, *kwargs)
File "/opt/conda/lib/python3.6/site-packages/transformers/models/longformer/modeling_longformer.py", line 1245, in forward
is_global_attn = is_index_global_attn.flatten().any().item()
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7fc78c43d99b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xc10 (0x7fc78c680280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fc78c425dfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7fc7c549d4e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x5603f8975aae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x5603f88cd868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x5603f89cbd91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x5603f88cd70d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x5603f8975a90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x5603f88cd868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x5603f89cbd91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x5603f88cd828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x5603f8975a90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x5603f88cd868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x5603f89cbd91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x5603f89438cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x5603f89cb79a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x5603f897ffa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x5603f89ea961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x5603f89f4cae in /opt/conda/bin/python)
frame #20: main + 0xee (0x5603f88bef2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7fc7f2cf3b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x5603f899e27f in /opt/conda/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7fa371cb999b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7fa371efc280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fa371ca1dfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7fa3aad194e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x5559699ffaae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x555969957868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x555969a55d91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x55596995770d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x5559699ffa90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x555969957868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x555969a55d91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x555969957828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x5559699ffa90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x555969957868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x555969a55d91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x5559699cd8cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x555969a5579a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x555969a09fa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x555969a74961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x555969a7ecae in /opt/conda/bin/python)
frame #20: main + 0xee (0x555969948f2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7fa3d856fb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x555969a2827f in /opt/conda/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f121fb5299b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f121fd95280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f121fb3adfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7f1258bb24e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x5601c5024aae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x5601c4f7c868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x5601c507ad91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x5601c4f7c70d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x5601c5024a90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x5601c4f7c868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x5601c507ad91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x5601c4f7c828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x5601c5024a90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x5601c4f7c868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x5601c507ad91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x5601c4ff28cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x5601c507a79a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x5601c502efa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x5601c5099961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x5601c50a3cae in /opt/conda/bin/python)
frame #20: main + 0xee (0x5601c4f6df2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7f1286408b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x5601c504d27f in /opt/conda/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7fe94f54799b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7fe94f78a280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fe94f52fdfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7fe9885a74e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x55ab4542baae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x55ab45383868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x55ab45481d91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x55ab4538370d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x55ab4542ba90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x55ab45383868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x55ab45481d91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x55ab45383828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x55ab4542ba90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x55ab45383868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x55ab45481d91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x55ab453f98cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x55ab4548179a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x55ab45435fa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x55ab454a0961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x55ab454aacae in /opt/conda/bin/python)
frame #20: main + 0xee (0x55ab45374f2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7fe9b5dfdb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x55ab4545427f in /opt/conda/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7fce50e8399b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7fce510c6280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fce50e6bdfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7fce89ee34e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x55919a5ffaae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x55919a557868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x55919a655d91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x55919a55770d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x55919a5ffa90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x55919a557868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x55919a655d91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x55919a557828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x55919a5ffa90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x55919a557868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x55919a655d91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x55919a5cd8cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x55919a65579a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x55919a609fa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x55919a674961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x55919a67ecae in /opt/conda/bin/python)
frame #20: main + 0xee (0x55919a548f2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7fceb7739b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x55919a62827f in /opt/conda/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f01ad8c799b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f01adb0a280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f01ad8afdfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7f01e69274e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x55c9bc565aae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x55c9bc4bd868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x55c9bc5bbd91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x55c9bc4bd70d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x55c9bc565a90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x55c9bc4bd868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x55c9bc5bbd91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x55c9bc4bd828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x55c9bc565a90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x55c9bc4bd868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x55c9bc5bbd91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x55c9bc5338cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x55c9bc5bb79a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x55c9bc56ffa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x55c9bc5da961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x55c9bc5e4cae in /opt/conda/bin/python)
frame #20: main + 0xee (0x55c9bc4aef2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7f021417db97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x55c9bc58e27f in /opt/conda/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7ff569f1599b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7ff56a158280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7ff569efddfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7ff5a2f754e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x562bbdb46aae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x562bbda9e868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x562bbdb9cd91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x562bbda9e70d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x562bbdb46a90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x562bbda9e868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x562bbdb9cd91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x562bbda9e828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x562bbdb46a90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x562bbda9e868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x562bbdb9cd91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x562bbdb148cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x562bbdb9c79a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x562bbdb50fa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x562bbdbbb961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x562bbdbc5cae in /opt/conda/bin/python)
frame #20: main + 0xee (0x562bbda8ff2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7ff5d07cbb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x562bbdb6f27f in /opt/conda/bin/python)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f9808d0299b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7f9808f45280 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f9808ceadfd in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: + 0x5414e2 (0x7f9841d624e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x19aaae (0x55ba33d58aae in /opt/conda/bin/python)
frame #5: + 0xf2868 (0x55ba33cb0868 in /opt/conda/bin/python)
frame #6: + 0x1f0d91 (0x55ba33daed91 in /opt/conda/bin/python)
frame #7: + 0xf270d (0x55ba33cb070d in /opt/conda/bin/python)
frame #8: + 0x19aa90 (0x55ba33d58a90 in /opt/conda/bin/python)
frame #9: + 0xf2868 (0x55ba33cb0868 in /opt/conda/bin/python)
frame #10: + 0x1f0d91 (0x55ba33daed91 in /opt/conda/bin/python)
frame #11: + 0xf2828 (0x55ba33cb0828 in /opt/conda/bin/python)
frame #12: + 0x19aa90 (0x55ba33d58a90 in /opt/conda/bin/python)
frame #13: + 0xf2868 (0x55ba33cb0868 in /opt/conda/bin/python)
frame #14: + 0x1f0d91 (0x55ba33daed91 in /opt/conda/bin/python)
frame #15: + 0x1688cb (0x55ba33d268cb in /opt/conda/bin/python)
frame #16: _PyGC_CollectNoFail + 0x2a (0x55ba33dae79a in /opt/conda/bin/python)
frame #17: PyImport_Cleanup + 0x278 (0x55ba33d62fa8 in /opt/conda/bin/python)
frame #18: Py_FinalizeEx + 0x61 (0x55ba33dcd961 in /opt/conda/bin/python)
frame #19: Py_Main + 0x35e (0x55ba33dd7cae in /opt/conda/bin/python)
frame #20: main + 0xee (0x55ba33ca1f2e in /opt/conda/bin/python)
frame #21: __libc_start_main + 0xe7 (0x7f986f5b8b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: + 0x1c327f (0x55ba33d8127f in /opt/conda/bin/python)