Skip to content

Conversation

@dashpole
Copy link
Collaborator

MaximilianMeister and others added 4 commits June 20, 2018 16:50
Earlier if the NVIDIA driver was not installed when cAdvisor was started
we would start a goroutine to try to initialize NVML every minute.

This resulted in a race. We can have a situation where:
- goroutine tries to initialize NVML but fails. So, it sleeps for a minute.
- the driver is installed.
- a container that uses NVIDIA devices is started.
This container would not get GPU stats because a minute has not passed
since the last failed initialization attempt and so NVML is not
initialized.
GetSpec() can be called concurrently in
manager/container.go.updateSpec()
results into a concurrent map access on the labels map because we're
directly updating the map inside GetSpec(). The labels map from the
container handler is not a copy of the map itself, just a reference,
that's why we're getting the concurrent map access.
Fix this by moving the label update with restartcount to the handler's
initialization method which is not called concurrently.

Signed-off-by: Antonio Murdaca <[email protected]>
@googlebot
Copy link
Collaborator

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@k8s-ci-robot
Copy link
Collaborator

@dashpole: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-cadvisor-e2e 9ab811d link /test pull-cadvisor-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dashpole
Copy link
Collaborator Author

this branch doesn't have the e2e testing changes, so tests wont pass...

@dashpole dashpole merged commit bc41996 into google:release-v0.28 Jun 21, 2018
@dashpole dashpole deleted the cherrypick_to_v0.28 branch June 21, 2018 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants