Skip to content

Conversation

@cyphar
Copy link
Member

@cyphar cyphar commented Jan 25, 2016

Rather than using '/' to denote hierarchy in slice names, systemd uses
'-' in an odd way. This results in runC incorrectly assuming that
certain kernel features are missing (and using inconsistent paths for
the cgroups not supported by systemd), because the "subsystem path" used
is not the one that systemd has created. Fix all of this by properly
expanding slice names.

Signed-off-by: Aleksa Sarai [email protected]

/cc @LK4D4 @mrunalp @dqminh @crosbymichael @hqhq

@cyphar
Copy link
Member Author

cyphar commented Jan 25, 2016

You can notice the issue if you do something like this:

Before this patch:

% docker run --cgroup-parent=test-a-b-c.slice alpine cat /proc/self/cgroup
11:memory:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
10:blkio:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
9:perf_event:/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
8:pids:/
7:cpuset:/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
6:devices:/test.slice
5:net_cls,net_prio:/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
4:freezer:/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
3:cpu,cpuacct:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
2:hugetlb:/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope
1:name=systemd:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope

After this patch:

% docker run --cgroup-parent=test-a-b-c.slice alpine cat /proc/self/cgroup
11:memory:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
10:blkio:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
9:perf_event:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
8:pids:/
7:cpuset:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
6:devices:/test.slice
5:net_cls,net_prio:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
4:freezer:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
3:cpu,cpuacct:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
2:hugetlb:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope
1:name=systemd:/test.slice/test-a.slice/test-a-b.slice/test-a-b-c.slice/docker-9916d763e97c6f6ae45412cb06353b226a5ec1386dfd1506e0bd357a19246a3c.scope

Now, the main reason why this is a big issue is because runC assumes that the cgroup path is the same as with cgroupfs (which it clearly isn't). This results in things like OOM notifications not being set because Docker uses the cgroup path that runC tells it is correct (which then doesn't actually affect the container). Note that the devices slice isn't /wrong/ it's just that systemd is being clever about what cgroup it attaches transient units to.

You'll notice this by observing the warnings in the Docker daemon:

WARN[0004] Your kernel does not support OOM notifications: open /sys/fs/cgroup/memory/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope/memory.oom_control: no such file or directory 
WARN[0004] Your kernel does not support OOM notifications: open /sys/fs/cgroup/memory/test-a-b-c.slice/docker-b8c2d0aebc73e7683ae31f14037ed647d60591fe79d6026fe35ff78d2d991d02.scope/memory.oom_control: no such file or directory 

Rather than using '/' to denote hierarchy in slice names, systemd uses
'-' in an odd way. This results in runC incorrectly assuming that
certain kernel features are missing (and using inconsistent paths for
the cgroups not supported by systemd), because the "subsystem path" used
is not the one that systemd has created. Fix all of this by properly
expanding slice names.

Signed-off-by: Aleksa Sarai <[email protected]>
@mrunalp
Copy link
Contributor

mrunalp commented Jan 25, 2016

LGTM

1 similar comment
@hqhq
Copy link
Contributor

hqhq commented Jan 26, 2016

LGTM

mrunalp pushed a commit that referenced this pull request Jan 26, 2016
cgroup: systemd: properly expand systemd slice names
@mrunalp mrunalp merged commit 80c2473 into opencontainers:master Jan 26, 2016
@cyphar cyphar deleted the fix-systemd-slice-expansion branch January 27, 2016 00:57
@hqhq hqhq mentioned this pull request Jan 27, 2016
runcom pushed a commit to runcom/docker that referenced this pull request Jun 8, 2016
Upstream reference: opencontainers/runc#511

Signed-off-by: Mrunal Patel <[email protected]>
Signed-off-by: Antonio Murdaca <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants