Skip to content
This repository was archived by the owner on Oct 22, 2024. It is now read-only.

Conversation

@pohly
Copy link
Contributor

@pohly pohly commented Oct 14, 2019

docker push" has been seen to fail temporarily with "error creating
overlay mount to /var/lib/docker/overlay2/xxx/merged: device or
resource busy". To increase the chance of the CI builds completing,
we simply try three times before giving up.

@pohly
Copy link
Contributor Author

pohly commented Oct 14, 2019

Example of the failure: https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/job/pmem-csi/job/devel/73/execution/node/161/log/

Somehow retrying the job didn't help, so it is open whether this more localized retry loop is really going to solve the issue.

@pohly
Copy link
Contributor Author

pohly commented Oct 14, 2019

I only caught one of two places where "docker push" is used. Do not merge yet...

@okartau
Copy link
Contributor

okartau commented Oct 14, 2019

just curious what might be root cause. I havent seen such "busy, failed to push" error in local builds
I usually push to remote registry.
Is it significant that we push to local registry in CI build?
Could added "journalctl -e" show possibly useful information after push failed?

@pohly
Copy link
Contributor Author

pohly commented Oct 14, 2019

I've also seen it on my local build machine, with the Docker registry running in Docker.

Not sure how to debug it, and this is one that I don't care about if the retry mechanism turns out to get us reliable CI builds.

docker push" has been seen to fail temporarily with "error creating
overlay mount to /var/lib/docker/overlay2/xxx/merged: device or
resource busy".  To increase the chance of the CI builds completing,
we simply try three times before giving up.
@okartau
Copy link
Contributor

okartau commented Oct 15, 2019

docker/for-linux#711 gives idea this gets fixed in docker 19.03.3.
Our current CI runs use 19.03.2 so we are close to get fixed one, where (hopefully) retry is not needed.
We have no proof yet does retrying improve this case.
Should we wait with this, try to tolerate occasional failures meanwhile and see does the newer docker eliminate the issue?

@pohly
Copy link
Contributor Author

pohly commented Oct 15, 2019

Do you know how to update Docker in the CI to 19.03.3? That would be the preferable solution.

@okartau
Copy link
Contributor

okartau commented Oct 15, 2019

Do you know how to update Docker in the CI to 19.03.3? That would be the preferable solution.

CI host is Ubuntu 18.04 LTS and gets docker by "apt install -y docker-ce",
so we get what is (latest) in repo if we don't specify some other (older) version.
19.03.3 was released 2019-10-08 so it's likely just too new to be present in repo yet.
BTW, Docker rel.notes mentions the problems we see as this:
"Fix overlay2: busy error on mount when using kernel >= 5.2" (there is also link to docker ticket)
Does our CI host really runs on such new kernel even if our distro is 18.04 LTS?

@okartau
Copy link
Contributor

okartau commented Oct 15, 2019

meanwhile I see more of push errors. Seems more than before (?).
Should we merge this anyway and monitor for further failures.
We can try reverting later after docker new version arrives.

@pohly
Copy link
Contributor Author

pohly commented Oct 15, 2019 via email

@pohly
Copy link
Contributor Author

pohly commented Oct 15, 2019

I'm fine with merging it and reverting later.

@okartau okartau merged commit 0c4267e into intel:devel Oct 15, 2019
@okartau
Copy link
Contributor

okartau commented Oct 15, 2019

Does our CI host really runs on such new kernel

I added uname -a in Jenkinsfile and CI host appears to run this:
ubuntu-jenkins-worker6826f0 5.0.0-1018-azure #19~18.04.1-Ubuntu SMP

so it is relatively new, 5.0 kernel but still older than 5.2 what Docker "busy fix" mentions.

@okartau
Copy link
Contributor

okartau commented Oct 15, 2019

Each time the machine boots?

yes, this host does not have life beyond build, it is dynamically created from empty state for a CI job.

@okartau
Copy link
Contributor

okartau commented Oct 16, 2019

is it possible this change by chance changed the make semantics?
noticed today that 'make push-images' does not implicitly build any more, I have to 'make build-images' separately. There has been no other changes close to make rules recently, right?
Does make push-images still build for you, have I misunderstood something.

@pohly
Copy link
Contributor Author

pohly commented Oct 16, 2019

Commenting out $(PUSH_IMAGE_DEP) was a mistake. I used that during debugging, shouldn't have been committed.

pohly added a commit to pohly/pmem-CSI that referenced this pull request Oct 16, 2019
intel#430 unintentionally disabled
the make dependencies. That was used during debugging and shouldn't
have been included in the PR.
@pohly
Copy link
Contributor Author

pohly commented Oct 16, 2019

Fix is in #436

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants