You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 16, 2022. It is now read-only.
I discovered this issue while using the new MultiprocessDataLoader with num_workers > 0 and max_instances_in_memory set to some high number (1000 in my case) to load batches that are built with instances that contain TensorFields.
...
File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 236, in __iter__
yield from self._iter_batches()
File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 421, in _iter_batches
raise e
RuntimeError: received 0 items of ancdata
The issue is stems from the fact that tensors are passed between processes using shared memory, but some systems (like the one I was on) may have strict limits on shared memory by default. So if you pile too many tensors into shared memory by having max_instances_in_memory too high, you're going to run into this. pytorch/pytorch#973 (comment).
Luckily the solution is simple: either decrease max_instances_in_memory (bringing it down to 100 worked in my case), or increase the shared memory available to your training process.