Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata)

I discovered this issue while using the new `MultiprocessDataLoader` with `num_workers > 0` and `max_instances_in_memory` set to some high number (1000 in my case) to load batches that are built with instances that contain `TensorField`s.

```
  ...
  File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 236, in __iter__
    yield from self._iter_batches()
  File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 421, in _iter_batches
    raise e
RuntimeError: received 0 items of ancdata
```


The issue is stems from the fact that tensors are passed between processes using shared memory, but some systems (like the one I was on) may have strict limits on shared memory by default. So if you pile too many tensors into shared memory by having `max_instances_in_memory` too high, you're going to run into this. https://github.com/pytorch/pytorch/issues/973#issuecomment-291287925.

**Luckily the solution is simple:** either decrease `max_instances_in_memory` (bringing it down to 100 worked in my case), or increase the shared memory available to your training process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions