Skip to content

Conversation

@hqhq
Copy link
Contributor

@hqhq hqhq commented Jan 21, 2016

We are now setting all cgroup entries in fs way, no need to
use dbus for memory, cpu shares etc.

Signed-off-by: Qiang Huang [email protected]

@dqminh
Copy link
Contributor

dqminh commented Jan 21, 2016

LGTM

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 21, 2016

Looks ok, but I don't know what original idea was behind this. Seems like we don't need systemd cgroup driver at all. We just need a support of systemd cgroup in fs driver, right?

@mrunalp
Copy link
Contributor

mrunalp commented Jan 21, 2016

We still need to use the API to actually create the scope. Setting the values could be done by fs for now (till unified lands, at which point we might have to again use systemd APIs).

@hqhq
Copy link
Contributor Author

hqhq commented Jan 22, 2016

@LK4D4 The plan is we share code as much as we can between fs cgroup and systemd cgroup, final goal is we don't use any systemd dbus APIs, use only fs cgroup to support both fs cgroup and systemd cgroup behavior. This PR is a first move.

@cyphar
Copy link
Member

cyphar commented Jan 23, 2016

There are issues with removing DBus support for systemd. Granted, it is quite buggy (and it actually helped find some bugs in systemd, oddly enough). And yes, we end up using cgroupfs for most of the cgroups anyway.

The main issue is that systemd's transient units aren't just another implementation of d.join -- they actually allow people who want to use systemd's tooling (god help them) to see what services are running on the system to be able to see Docker containers (because systemctl knows about transient units).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a good idea, it might cause systemd to change the cgroup paths the container joins (which might cause further issues).

@cyphar
Copy link
Member

cyphar commented Jan 27, 2016

@hqhq Also, why don't we just iterate over the subsystemSet? With the PIDs cgroup changes, Apply doesn't do anything magical anymore -- it was all moved to Set.

@hqhq
Copy link
Contributor Author

hqhq commented Jan 27, 2016

There are issues with removing DBus support for systemd. Granted, it is quite buggy (and it actually helped find some bugs in systemd, oddly enough). And yes, we end up using cgroupfs for most of the cgroups anyway.

The main issue is that systemd's transient units aren't just another implementation of d.join -- they actually allow people who want to use systemd's tooling (god help them) to see what services are running on the system to be able to see Docker containers (because systemctl knows about transient units).

@cyphar Yeah I know transient units aren't just another implementation of d.join, removing all DBus support for systemd could be difficult and I haven't figured out how exactly to do it and if it's worth doing. But for now, I think the goal is to share as much code with fs cgroup as we can, without impacting systemd users.

I'm not sure if this is a good idea, it might cause systemd to change the cgroup paths the container joins (which might cause further issues).

Not after #511 right? The only downside I see of removing this is we won't have configuration files like 50-MemoryLimit.conf in /run/systemd/system/docker-<id>.scope.d, I don't know if that matter because we have supported lots of properties which don't have configuration files under /runc/systemd/system/.

Also, why don't we just iterate over the subsystemSet? With the PIDs cgroup changes, Apply doesn't do anything magical anymore -- it was all moved to Set.

Yeah we can do that, I just want to be step by step, see if we agreed on this goal and then we can clean up more. :)

@hqhq
Copy link
Contributor Author

hqhq commented Jan 27, 2016

@cyphar And also, I'd like to enable systemd support for runc, Don't know if there are any users but it'll be much easier to test systemd. I'm working on the panic as described in #325 but haven't got any clue on that.

@cyphar
Copy link
Member

cyphar commented Jan 27, 2016

@hqhq

I'm not sure if this is a good idea, it might cause systemd to change the cgroup paths the container joins (which might cause further issues).

Not after #511 right? The only downside I see of removing this is we won't have configuration files like 50-MemoryLimit.conf in /run/systemd/system/docker-.scope.d, I don't know if that matter because we have supported lots of properties which don't have configuration files under /runc/systemd/system/.

As far as I can tell, it might be an issue (#511 was concerned with making sure the cgroupfs stuff matches systemd). systemd makes weird decisions about which cgroups a unit it manages should be attached to. Among other things, it's based on what limits are enabled and set for the unit as well as the state of other units running on the system (which sounds like a bug to me, but the systemd guys appear to disagree with me about that).

So, I'm not entirely sure. I'd prefer leaving the properties in until we can identify if any weird things happen when we remove them.

And also, I'd like to enable systemd support for runc, Don't know if there are any users but it'll be much easier to test systemd. I'm working on the panic as described in #325 but haven't got any clue on that.

I'm +1 on anything that improves our testability (our test suite in runc is quite pitiful -- cgroup tests don't actually check that the limits are enforced). I'll take a look at the issue you referenced, FWIW.

@hqhq
Copy link
Contributor Author

hqhq commented Jan 27, 2016

systemd makes weird decisions about which cgroups a unit it manages should be attached to. Among other things, it's based on what limits are enabled and set for the unit as well as the state of other units running on the system (which sounds like a bug to me, but the systemd guys appear to disagree with me about that).

Do you mean that what limit should a process be set is uncertain? Like the shell:

$ cat /proc/self/cgroup
10:devices:/user.slice
9:hugetlb:/
8:freezer:/
7:memory:/user.slice
6:net_cls,net_prio:/
5:blkio:/user.slice
4:perf_event:/
3:cpuset:/
2:cpu,cpuacct:/user.slice
1:name=systemd:/user.slice/user-1000.slice/session-11.scope

It's in different cgroups, but if we set the limit, it'll join the cgroup:

$ sudo systemctl set-property session-11.scope MemoryLimit=300M
$ cat /proc/self/cgroup
10:devices:/user.slice/user-1000.slice
9:hugetlb:/
8:freezer:/
7:memory:/user.slice/user-1000.slice/session-11.scope
6:net_cls,net_prio:/
5:blkio:/user.slice/user-1000.slice
4:perf_event:/
3:cpuset:/
2:cpu,cpuacct:/user.slice/user-1000.slice
1:name=systemd:/user.slice/user-1000.slice/session-11.scope

Or do you mean something else that you think is a systemd bug?

@cyphar
Copy link
Member

cyphar commented Jan 27, 2016

@hqhq Yes, that's what I meant. Although, after the PIDs cgroup changes, we explicitly join everything (which means that systemd is really only being used for systemd-specific signalling). So my earlier comment might be a red herring, I'm not sure. Something I just noticed is that the devices cgroup path is /<first slice path>.slice when using systemd...

@crosbymichael crosbymichael modified the milestones: 0.0.8, 0.0.9 Feb 4, 2016
Signed-off-by: Qiang Huang <[email protected]>
@hqhq hqhq force-pushed the hq_cleanup_systemd_apply branch from 13e2b4d to ffbc347 Compare February 15, 2016 08:35
@hqhq
Copy link
Contributor Author

hqhq commented Feb 15, 2016

I've reverted systemd dbus api change, and removed all joinXXX functions.
@mrunalp @dqminh @LK4D4 @cyphar please review again.

func joinCgroups(c *configs.Cgroup, pid int) error {
for _, sys := range subsystems {
name := sys.Name()
if name == "name=systemd" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a switch IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks.

@hqhq hqhq force-pushed the hq_cleanup_systemd_apply branch from ffbc347 to bda7742 Compare February 15, 2016 08:56
@LK4D4
Copy link
Contributor

LK4D4 commented Feb 27, 2016

Now looks less breaking :)
LGTM

@cyphar
Copy link
Member

cyphar commented Feb 27, 2016

IANAM, but LGTM. I like this refactor. :D

@hqhq
Copy link
Contributor Author

hqhq commented Mar 8, 2016

ping @dqminh @mrunalp @crosbymichael

@mrunalp
Copy link
Contributor

mrunalp commented Mar 8, 2016

LGTM

mrunalp pushed a commit that referenced this pull request Mar 8, 2016
@mrunalp mrunalp merged commit 5b439d8 into opencontainers:master Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants