Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

@epwalsh

Description

@epwalsh

I discovered this issue while using the new MultiprocessDataLoader with num_workers > 0 and max_instances_in_memory set to some high number (1000 in my case) to load batches that are built with instances that contain TensorFields.

  ...
  File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 236, in __iter__
    yield from self._iter_batches()
  File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 421, in _iter_batches
    raise e
RuntimeError: received 0 items of ancdata

The issue is stems from the fact that tensors are passed between processes using shared memory, but some systems (like the one I was on) may have strict limits on shared memory by default. So if you pile too many tensors into shared memory by having max_instances_in_memory too high, you're going to run into this. pytorch/pytorch#973 (comment).

Luckily the solution is simple: either decrease max_instances_in_memory (bringing it down to 100 worked in my case), or increase the shared memory available to your training process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions