Initialize NVML on demand. #1969

rohitagarwal003 · 2018-06-16T01:53:27Z

Earlier if the NVIDIA driver was not installed when cAdvisor was started
we would start a goroutine to try to initialize NVML every minute.

This resulted in a race. We can have a situation where:

goroutine tries to initialize NVML but fails. So, it sleeps for a minute.
the driver is installed.
a container that uses NVIDIA devices is started.
This container would not get GPU stats because a minute has not passed
since the last failed initialization attempt and so NVML is not
initialized.

rohitagarwal003 · 2018-06-16T01:53:47Z

/assign @dashpole @vishh

dashpole · 2018-06-18T16:02:06Z

accelerators/nvidia.go

+	if !nm.nvmlInitialized {
+		initializeNVML(nm)
+		nm.Unlock()
+	} else {


you dont need the else here. Just unlock outside the if statement.

Actually this made me realize that I can simplify the taking locks multiple times to taking lock just once. PTAL.

dashpole

one nit. Otherwise looks good.

vishh · 2018-06-18T16:39:54Z

/lgtm

Earlier if the NVIDIA driver was not installed when cAdvisor was started we would start a goroutine to try to initialize NVML every minute. This resulted in a race. We can have a situation where: - goroutine tries to initialize NVML but fails. So, it sleeps for a minute. - the driver is installed. - a container that uses NVIDIA devices is started. This container would not get GPU stats because a minute has not passed since the last failed initialization attempt and so NVML is not initialized.

dashpole

LGTM

Cherrypick #1971, #1969 #1964, #1963 to release v0.28

Cherrypick #1971, #1969 #1964, #1963 to release v0.29

Cherrypick #1971, #1969 #1964, #1963 to release v0.30

The race condition that required this sleep was fixed in google/cadvisor#1969. That was vendored in kubernetes#65334.

@jiayingz

Automatic merge from submit-queue (batch tested with PRs 65377, 63837, 65370, 65294, 65376). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Remove unneeded sleep from test. The race condition that required this sleep was fixed in google/cadvisor#1969. That was vendored in #65334. ```release-note NONE ``` /assign @jiayingz @vishh

dashpole reviewed Jun 18, 2018

View reviewed changes

rohitagarwal003 force-pushed the fix-race branch from 4c45a2e to 844caff Compare June 18, 2018 17:33

rohitagarwal003 force-pushed the fix-race branch from 844caff to 2ce4161 Compare June 18, 2018 17:46

dashpole approved these changes Jun 18, 2018

View reviewed changes

dashpole merged commit fc0bd7a into google:master Jun 18, 2018

dashpole added a commit that referenced this pull request Jun 21, 2018

Merge pull request #1974 from dashpole/cherrypick_to_v0.28

bc41996

Cherrypick #1971, #1969 #1964, #1963 to release v0.28

dashpole added a commit that referenced this pull request Jun 21, 2018

Merge pull request #1973 from dashpole/cherrypick_to_v0.29

d9e88b7

Cherrypick #1971, #1969 #1964, #1963 to release v0.29

dashpole added a commit that referenced this pull request Jun 21, 2018

Merge pull request #1972 from dashpole/cherrypick_things

50ae2ef

Cherrypick #1971, #1969 #1964, #1963 to release v0.30

rohitagarwal003 added a commit to rohitagarwal003/kubernetes that referenced this pull request Jun 22, 2018

Remove unneeded sleep from test.

9a9c2ae

The race condition that required this sleep was fixed in google/cadvisor#1969. That was vendored in kubernetes#65334.

rohitagarwal003 mentioned this pull request Jun 22, 2018

Remove unneeded sleep from test. kubernetes/kubernetes#65376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initialize NVML on demand. #1969

Initialize NVML on demand. #1969

Uh oh!

rohitagarwal003 commented Jun 16, 2018

Uh oh!

rohitagarwal003 commented Jun 16, 2018

Uh oh!

dashpole Jun 18, 2018

Uh oh!

rohitagarwal003 Jun 18, 2018

Uh oh!

rohitagarwal003 Jun 18, 2018

Uh oh!

dashpole left a comment

Uh oh!

vishh commented Jun 18, 2018

Uh oh!

dashpole left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Initialize NVML on demand. #1969

Initialize NVML on demand. #1969

Uh oh!

Conversation

rohitagarwal003 commented Jun 16, 2018

Uh oh!

rohitagarwal003 commented Jun 16, 2018

Uh oh!

dashpole Jun 18, 2018

Choose a reason for hiding this comment

Uh oh!

rohitagarwal003 Jun 18, 2018

Choose a reason for hiding this comment

Uh oh!

rohitagarwal003 Jun 18, 2018

Choose a reason for hiding this comment

Uh oh!

dashpole left a comment

Choose a reason for hiding this comment

Uh oh!

vishh commented Jun 18, 2018

Uh oh!

dashpole left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants