Skip to content

Cannot run any container if storage owned by group not mapped into container with --userns=keep-id #1777

@Maetveis

Description

@Maetveis

Description

This issue is very similar to #1483, if it seems familiar please bear with me, I think there is an important difference. In #1483 the storage folder is inaccessible to the user running podman without their secondary groups, but in this issue it should still be accessible because the user in question owns the folder.

Running a container in rootless mode and --userns=keep-id with podman fails if a parent directory of the storage is owned by a group not part of the containers user namespace, and that directory also has no world execute permission.

For example, if the username is user with main group user, then a directory with permissions drwx------ owned by user and owning group userdata as a parent path of the storage location leads to a failure to run any container.

Steps to reproduce:

  1. Use the following storage.conf (in ~/.config/containers/):
[storage]
  driver = "overlay"
  graphroot = "/tmp/podman-test/storage"
[storage.options]
  mount_program = "/usr/bin/fuse-overlayfs"
  1. Create a group userdata (it's not required to add user to this group)
  2. Setup the storage folder:
mkdir /tmp/podman-test
chown user:userdata /tmp/podman-test/
chmod u+rwx /tmp/podman-test/
chmod og-rwx /tmp/podman-test/
  1. Run a container with a non-root user: podman run --userns=keep-id alpine ls

Expected result:

The container is run successfully.

Actual result:

The following error is raised:

Error: crun: open `/tmp/graphroot/overlay/5fba3a9a250150294dcb692656d165ec6bd26c9c6be2c692183a70e24083b29c/merged`: Permission denied: OCI permission denied

After running chgrp user /tmp/podman-test/ or chmod o+x /tmp/podman-test, the container runs successfully.

Analysis

Based on me digging around the code and experimenting with strace, I think the diagram below describes what's going on:

Assuming the following user and group ids, and /etc/subuid:

> id user
uid=1000(user) gid=1000(user) groups=1000(user)
> grep user /etc/subuid
user:100000:65536

Then the simplified view of events is:

sequenceDiagram
    participant Podman
    participant Crun
    participant Kernel

    Podman->>Podman: Set up user namespace:<br> root -> user (Table 1)
    
    Podman->>Crun: start container with:<br>user -> root (Table 2)
    activate Crun
    
    Crun->>Crun: re-invoke in user namespace
    
    Crun->>Kernel: open(root.path)
    deactivate Crun
    activate Kernel
    
    Note over Kernel: process uid: 0<br>process gid: 0<br>file perms: drwx------<br>file uid:1000<br>file gid:[invalid]
    
    Kernel--xCrun: return: -1, errno: EPERM
    deactivate Kernel
    activate Crun
Loading

open(root.path) fails because the kernel requires the uid and gid of the accessed file to be mapped for the root capabilities to take effect 1. Normal access checks fail because at this point cruns uid is still 0, so user permissions don't apply.

Table 1: User Namespace Setup (Podman)

UID in NS UID in host
[0] [1000]
[1, 65536] [100000, 165535]

Table 2: Container Spec linux.uidMappings

UID in container UID in outer NS
[0, 999] [1, 1000]
[1000] [0]
[1001, 65536] [1001, 65536]

Possible fixes

  1. I think crun could open the storage root path before entering the user namespace, in the same way as its done for mount paths. There the root user in the namespace is still mapped to the user that invoked podman in the host.
  2. Or open it after setuid.
  3. podman could open and pass a file descriptor to crun for the graph storage root.

Option 3 would be the most ideal, because if the user in the host can access the storage root, then IMO containers should be able to start from it, regardless of what permissions enable that access in the host (file ownership, primary or secondary group or ACLs). It's probably not simple to implement however.

Footnotes

  1. Quoting from man 7 user_namespaces:

    Certain capabilities allow a process to bypass various kernel-enforced restrictions when performing operations on files owned by other users or groups. These capabilities are: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, and CAP_FSETID.

    Within a user namespace, these capabilities allow a process to bypass the rules if the process has the relevant capability over the file, meaning that:

    • the process has the relevant effective capability in its user namespace; and
    • the file's user ID and group ID both have valid mappings in the user namespace.

    (Emphasis mine)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions