Use a CUDAGuard when running Torch models #340

VivekPanyam · 2020-04-23T02:00:11Z

This PR ensures that we're running on the correct device even if something else calls cudaSetDevice before running inference.

This fixes a class of issues where another piece of code changes the current device for the current thread. For example, this can happen if TF and Torch run together on the same threadpool. TF will call cudaSetDevice and cause torch to break if it runs on the same thread in the future.

This can cause some obscure cuDNN errors and generally hard-to-debug issues.

vkuzmin-uber · 2020-08-31T20:08:31Z

source/neuropod/backends/torchscript/torch_backend.cc


+#ifndef __APPLE__
+    // Make sure we're running on the correct device
+    std::unique_ptr<at::cuda::CUDAGuard> device_guard;


#include <memory>

vkuzmin-uber · 2020-08-31T20:15:09Z

source/neuropod/backends/torchscript/torch_backend.cc

+    const auto                           model_device = get_torch_device(DeviceType::GPU);
+    if (model_device.is_cuda())
+    {
+        device_guard = stdx::make_unique<at::cuda::CUDAGuard>(model_device);


I guess we can use std:: here not stdx:: because Neuropod became C++14 recently, right?

VivekPanyam requested a review from selitvin April 23, 2020 02:00

Use a CUDAGuard when running Torch models

c23111d

VivekPanyam force-pushed the cuda_guard branch from bd5263b to c23111d Compare April 23, 2020 02:03

vkuzmin-uber reviewed Aug 31, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use a CUDAGuard when running Torch models #340

Use a CUDAGuard when running Torch models #340

Uh oh!

VivekPanyam commented Apr 23, 2020

Uh oh!

vkuzmin-uber Aug 31, 2020

Uh oh!

vkuzmin-uber Aug 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use a CUDAGuard when running Torch models #340

Are you sure you want to change the base?

Use a CUDAGuard when running Torch models #340

Uh oh!

Conversation

VivekPanyam commented Apr 23, 2020

Uh oh!

vkuzmin-uber Aug 31, 2020

Choose a reason for hiding this comment

Uh oh!

vkuzmin-uber Aug 31, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants