[docker-pmon] limit privileged flag for pmon container#23457
[docker-pmon] limit privileged flag for pmon container#23457qiluo-msft merged 9 commits intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
bb28663 to
acd72c6
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
cb29428 to
acd72c6
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@ds952811 pls help review |
|
@adyeung Could you help review and verify the container hardening of pmon work for Broadcom asic platform? |
qiluo-msft
left a comment
There was a problem hiding this comment.
LGTM, but need other platform vendors to test
Why I did it
For platform using sysfs, the current pmon is broken.
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 30] Read-only file system: '/sys/class/i2c-adapter/i2c-71/71-0050/eeprom'
After this fix
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
<_io.FileIO name='/sys/class/i2c-adapter/i2c-71/71-0050/eeprom' mode='rb+' closefd=True>
This regression was introduced by #23457
|
Hi @DavidZagury and @qiluo-msft, |
Feel free to raise PRs if it helps. I guess @DavidZagury have no such platforms to test. |
Why I did it
On platforms using the IPMI interface, the current pmon is broken.
root@sonic:~# show logging "Could not open device"
2025 Sep 18 02:18:30.578482 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:18:31.689267 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:55.434346 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:57.355865 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:57.355865 sonic INFO pmon#supervisord: thermalctld Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:58.653110 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
root@sonic:~# docker exec -it pmon bash -c "ls /dev/ipmi*"
ls: cannot access '/dev/ipmi*': No such file or directory
root@sonic:~# show platform psustatus
Error: Failed to get PSU status
Error: failed to get PSU status from state DB
root@sonic:~# show platform temperature
Thermal Not detected
root@sonic:~#
After this fix
root@sonic:~# show logging "Could not open device"
root@sonic:~# docker exec -it pmon bash -c "ls /dev/ipmi*"
/dev/ipmi0
root@sonic:~# show platform psustatus
PSU Model Serial HW Rev Voltage (V) Current (A) Power (W) Status LED
----- ---------- ------------------- -------- ------------- ------------- ----------- -------- -----
PSU1 YNEE0750EM F7510AS90580X020027 N/A 0.00 0.00 0.00 NOT OK green
PSU2 YNEE0750EM F7510AS90580X020024 N/A 12.12 10.00 121.20 OK green
root@sonic:~# show platform temperature
Sensor Temperature High TH Low TH Crit High TH Crit Low TH Warning Timestamp
---------------- ------------- --------- -------- -------------- ------------- --------- -----------------
PSU1_TEMP1 32 N/A N/A N/A N/A False 20250918 02:07:24
PSU2_TEMP1 32 N/A N/A N/A N/A False 20250918 02:07:24
TEMP_ENV_BMC 37 75.0 N/A 80.0 N/A False 20250918 02:07:23
TEMP_ENV_MACCASE 40 75.0 N/A 80.0 N/A False 20250918 02:07:23
TEMP_ENV_PSUCASE 32 57.0 N/A 62.0 N/A False 20250918 02:07:23
TEMP_ENV_SSDCASE 42 75.0 N/A 80.0 N/A False 20250918 02:07:23
TEMP_MAC 41 95.0 N/A 105.0 N/A False 20250918 02:07:22
TEMP_PSU0_TEMP1 32 N/A N/A 70.0 N/A False 20250918 02:07:23
TEMP_PSU1_TEMP1 32 N/A N/A 70.0 N/A False 20250918 02:07:24
root@sonic:~#
This regression was introduced by #23457
|
@DavidZagury @qiluo-msft @prgeor With this PR included we see that the privileges for SYSFS have been made read-only in the PMON docker The thermactld running inside the PMON docker is unable to control the fan speed as it cannot write into the PWM register and the devices are going into overheat |
Could you share the mount part in |
I had not included #24017 in my build. That PR may address the issue. |
…-net#23457)" This reverts commit a13685f.
Why I did it HLD implementation: Container Hardening (sonic-net/SONiC#1364) How I did it Reduce linux capabilities in privileged flag How to verify it Run platform tests. Check container's settings: Privileged is false and container only has default Linux caps, and SYS_RAWIO/SYS_ADMIN cap. Signed-off-by: Feng Pan <fenpan@microsoft.com>
Why I did it
For platform using sysfs, the current pmon is broken.
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 30] Read-only file system: '/sys/class/i2c-adapter/i2c-71/71-0050/eeprom'
After this fix
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
<_io.FileIO name='/sys/class/i2c-adapter/i2c-71/71-0050/eeprom' mode='rb+' closefd=True>
This regression was introduced by sonic-net#23457
Signed-off-by: Feng Pan <fenpan@microsoft.com>
Why I did it
On platforms using the IPMI interface, the current pmon is broken.
root@sonic:~# show logging "Could not open device"
2025 Sep 18 02:18:30.578482 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:18:31.689267 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:55.434346 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:57.355865 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:57.355865 sonic INFO pmon#supervisord: thermalctld Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
2025 Sep 18 02:21:58.653110 sonic INFO pmon#supervisord: psud Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
root@sonic:~# docker exec -it pmon bash -c "ls /dev/ipmi*"
ls: cannot access '/dev/ipmi*': No such file or directory
root@sonic:~# show platform psustatus
Error: Failed to get PSU status
Error: failed to get PSU status from state DB
root@sonic:~# show platform temperature
Thermal Not detected
root@sonic:~#
After this fix
root@sonic:~# show logging "Could not open device"
root@sonic:~# docker exec -it pmon bash -c "ls /dev/ipmi*"
/dev/ipmi0
root@sonic:~# show platform psustatus
PSU Model Serial HW Rev Voltage (V) Current (A) Power (W) Status LED
----- ---------- ------------------- -------- ------------- ------------- ----------- -------- -----
PSU1 YNEE0750EM F7510AS90580X020027 N/A 0.00 0.00 0.00 NOT OK green
PSU2 YNEE0750EM F7510AS90580X020024 N/A 12.12 10.00 121.20 OK green
root@sonic:~# show platform temperature
Sensor Temperature High TH Low TH Crit High TH Crit Low TH Warning Timestamp
---------------- ------------- --------- -------- -------------- ------------- --------- -----------------
PSU1_TEMP1 32 N/A N/A N/A N/A False 20250918 02:07:24
PSU2_TEMP1 32 N/A N/A N/A N/A False 20250918 02:07:24
TEMP_ENV_BMC 37 75.0 N/A 80.0 N/A False 20250918 02:07:23
TEMP_ENV_MACCASE 40 75.0 N/A 80.0 N/A False 20250918 02:07:23
TEMP_ENV_PSUCASE 32 57.0 N/A 62.0 N/A False 20250918 02:07:23
TEMP_ENV_SSDCASE 42 75.0 N/A 80.0 N/A False 20250918 02:07:23
TEMP_MAC 41 95.0 N/A 105.0 N/A False 20250918 02:07:22
TEMP_PSU0_TEMP1 32 N/A N/A 70.0 N/A False 20250918 02:07:23
TEMP_PSU1_TEMP1 32 N/A N/A 70.0 N/A False 20250918 02:07:24
root@sonic:~#
This regression was introduced by sonic-net#23457
Signed-off-by: Feng Pan <fenpan@microsoft.com>
| -e SX_API_SOCKET_FILE=/var/run/sx_sdk/sx_api.sock \ | ||
| {%- elif docker_container_name == "pmon" %} | ||
| -v /sys/devices/platform/mlxplat:/sys/devices/platform/mlxplat:rw \ | ||
| -v /sys/module/sx_core:/sys/module/sx_core:rw \ |
There was a problem hiding this comment.
This path may not exist on Mellanox platform.
admin@sonic:~$ docker restart pmon
Error response from daemon: Cannot restart container pmon: error while creating mount source path '/sys/module/sx_core': mkdir /sys/module/sx_core: operation not permitted
|
@qiluo-msft @DavidZagury this change has resulted in issues within PMON having access to monitor/interact with devices under /dev/ (#25142). We are currently investigating some ways to handle this, in the SONiC Chassis weekly call this was discussed and we wanted to get some further insight into the background and requirements of this change as well as potential solutions impacts. @rlhui to help setup a call to discuss this further, thanks! |
Why I did it
For platform using sysfs, the current pmon is broken.
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 30] Read-only file system: '/sys/class/i2c-adapter/i2c-71/71-0050/eeprom'
After this fix
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
<_io.FileIO name='/sys/class/i2c-adapter/i2c-71/71-0050/eeprom' mode='rb+' closefd=True>
This regression was introduced by sonic-net#23457
Why I did it
For platform using sysfs, the current pmon is broken.
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 30] Read-only file system: '/sys/class/i2c-adapter/i2c-71/71-0050/eeprom'
After this fix
admin@sonic:~$ docker exec -it pmon bash
root@sonic:/# python3
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/sys/class/i2c-adapter/i2c-71/71-0050/eeprom', mode='r+b', buffering=0)
<_io.FileIO name='/sys/class/i2c-adapter/i2c-71/71-0050/eeprom' mode='rb+' closefd=True>
This regression was introduced by sonic-net#23457
Signed-off-by: Boyang Yu <byu@arista.com>
Why I did it
HLD implementation: Container Hardening (sonic-net/SONiC#1364)
Work item tracking
How I did it
Reduce linux capabilities in privileged flag
How to verify it
Run platform tests.
Check container's settings: Privileged is false and container only has default Linux caps, and SYS_RAWIO/SYS_ADMIN cap.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)