Skip to content

Conversation

@kazmsk
Copy link
Contributor

@kazmsk kazmsk commented Dec 4, 2024

This implements the process_capabilities_fail validation in #361

@kazmsk kazmsk force-pushed the add-test-process_capabilities_fail branch from 52dfb64 to 841a685 Compare December 8, 2024 23:49
Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, Thanks for the PR! There is some change needed - basically the current test validates the capabilities call itself fails, i.e. it will error out from line 9. However, we want to check that the container creation itself fails. For this we will need to create a spec with invalid value, and then assert that the container creation has failed.

For this, what we can do is this -

  • create a valid spec, with a single valid capacity
  • then use this spec with test_inside_container
    • here you'll need to update the code of test_inside_container in test_utils:204 to return error instead of calling unwrap.
    • In the function call, we get to provide a callback function which runs right before the container is created. This fn gets the rootfs path as a param. The spec (named config.json) it right outside the rootfs files for our tests, so you can edit the path to get the spec string, replace the cap name with an invalid name and re-write it there. Now we will have an invalid spec.
  • With this, you can check that the test_inside_container fails because of the capabilities issue.

Please let me know if the above explanation is not clear/ you need any help.

@kazmsk kazmsk force-pushed the add-test-process_capabilities_fail branch from beac1c7 to 2b72e74 Compare December 26, 2024 02:51
@kazmsk kazmsk requested a review from YJDoc2 December 26, 2024 03:20
@kazmsk
Copy link
Contributor Author

kazmsk commented Dec 26, 2024

@YJDoc2
I have modified it to test capability modification in the callback function.
Please review.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Dec 26, 2024

Hey @kazmsk , thanks for the change, I'll try to take a look at this by the weekend.

@kazmsk
Copy link
Contributor Author

kazmsk commented Jan 17, 2025

Hi @YJDoc2 , how is the status of the review?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Feb 11, 2025

Hey @kazmsk , sincere apologies from my side. I got busy with things, and I missed following up with this. May I ask you to sync with current main and resolve the conflicts?

Again, apologies for not keeping up with this properly 🙏

@kazmsk kazmsk force-pushed the add-test-process_capabilities_fail branch from 2b72e74 to 9950c8f Compare March 3, 2025 01:51
@kazmsk
Copy link
Contributor Author

kazmsk commented Mar 3, 2025

Hi @YJDoc2 , I apologize for not being able to respond earlier.
I have resolved the conflict, so could you please review it again?

Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, the test itself looks good, but one comment to fix. Also -

  1. Can you add the comment on why we are loading the spec and then re-writing it ; basically what I mentioned in my comment before your recent changes.
  2. Can you run cargo fmt and check just lint is passing? That is way the CI is failing.

After these are fixed, I think I'll go ahead and merge. Thanks :)

Comment on lines 18 to 19
"runtimetest".to_string(),
"process_capabilities_fail".to_string(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this here, the default command should be ok. We don't actually intend this to reach the runtimetest stage, but instead fail in the creation itself. Right now, if we add this, and somehow the container is created successfully ; then we would still get error from the runtime test, which would result in test_inside_container returning Failed and thus our test incorrectly passing. You can simply set the args to sleep,1m or such (check what we set by default in utils .

Copy link
Contributor Author

@kazmsk kazmsk Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have set "sleep, 1m," but I'm not sure where the default is specifically referring to. It would be helpful if you could provide a link.

@kazmsk
Copy link
Contributor Author

kazmsk commented Mar 4, 2025

Hi @YJDoc2,
I have made the changes. Could you please check?

@kazmsk kazmsk requested a review from YJDoc2 March 4, 2025 01:00
@kazmsk
Copy link
Contributor Author

kazmsk commented Mar 13, 2025

Hi @YJDoc2,
Could you kindly confirm if you've had a chance to review it?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Mar 14, 2025

Hey, so I checked this and code wise it seems ok ; but I'm getting one issue here -
When I try to run this locally, the test passes ; but if I try to comment out the part where we modify the config json, it still fails, with error

container stderr was not empty, found : ERROR libcontainer::process::container_init_process: failed to drop capabilities err=SetCaps(CapsError("capset failure: Operation not permitted (os error 1)"))
ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed syscall
ERROR libcontainer::process::container_main_process: failed to wait for init ready: exec process failed with error error in executing process : failed syscall
ERROR libcontainer::container::builder_impl: failed to run container process exec process failed with error error in executing process : failed syscall
ERROR youki: error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall
error in executing command: failed to create container: exec process failed with error error in executing process : failed syscall
Error: failed to create container: exec process failed with error error in executing process : failed syscall

I suspect that this is related to my system not being configured correctly, and also some changes done in latest ubuntu bases systems , as seen in #3097 (comment)

Can you check if the test is passing and failing correctly on your system, and if so also add a check to the error we get to verify it is because the cap is wrong, and not because any other problem?

Apart from that, this is good, Thanks :)

@kazmsk
Copy link
Contributor Author

kazmsk commented Apr 15, 2025

Hi @YJDoc2,
I apologize for the delay in replying.

Is this the correct way to run tests in the local environment?
https://github.com/youki-dev/youki/blob/main/tests/contest/contest/README.md#usage

Do the commands seem correct?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Apr 17, 2025

Hey,

  1. to run contest, run just test-contest (after installing just) and just validate-contest-runc to validate it on runc.
  2. There is some conflict in main.rs , please take a look and fix it.

Kazuki Massaki and others added 8 commits May 1, 2025 09:47
Signed-off-by: Kazuki Massaki <[email protected]>
Signed-off-by: kazuki.massaki <[email protected]>
Signed-off-by: Kazuki Massaki <[email protected]>
Signed-off-by: kazuki.massaki <[email protected]>
Signed-off-by: Kazuki Massaki <[email protected]>
Signed-off-by: kazuki.massaki <[email protected]>
Signed-off-by: kazuki.massaki <[email protected]>
Signed-off-by: kazuki.massaki <[email protected]>
@kazmsk kazmsk force-pushed the add-test-process_capabilities_fail branch from a117926 to 75af8b1 Compare May 1, 2025 00:47
@kazmsk kazmsk force-pushed the add-test-process_capabilities_fail branch from 78f0200 to 2d479a0 Compare May 2, 2025 08:11
@kazmsk
Copy link
Contributor Author

kazmsk commented May 2, 2025

Hi @YJDoc2,

Thank you, I was able to reproduce the issue in my environment as well.
I made some changes to the capability rewriting, but it seems like the container still starts even though an invalid capability is set.
Would it be possible to get some advice on this...?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 7, 2025

Hey @kazmsk , can you elaborate more on that? Why is it able to start when the spec has invalid cap? Do you have a way to re-create this locally so I can try it out? Thanks :)

@kazmsk
Copy link
Contributor Author

kazmsk commented May 8, 2025

Hi @YJDoc2,

When I ran just test-contest locally, the process_capabilities_fail_test result was not ok, and the message container creation succeeded unexpectedly. was printed.

# Start group process_capabilities_fail
1 / 1 : process_capabilities_fail_test : not ok
	container creation succeeded unexpectedly.
# End group process_capabilities_fail

This is because test_inside_container passes even though the capability is being overwritten with an invalid one, which led me to conclude that the container is actually starting up.

Please let me know if my understanding is incorrect.

@saku3
Copy link
Member

saku3 commented May 8, 2025

Sorry to interrupt,

It seems that the spec is being overwritten after TEST_CAP is set.

Let's take a look at the test_inside_container function:

  • At line 184 in test_inside_container, the closure(&|bundle| {...}) that sets TEST_CAP is executed:

    test_result!(setup_for_test(
        &bundle.as_ref().join("bundle").join("rootfs")
    ));
  • Then at line 189, the following line overwrites the config.json using the spec passed as an argument:

    set_config(&bundle, spec).unwrap();

As a result, the modified config.json is overwritten with the original spec.

To confirm this, it would be helpful to print the contents of config.json right before the create_container.

kazmsk added 2 commits May 9, 2025 12:54
Signed-off-by: kazuki.massaki <[email protected]>
@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 12, 2025

Hey @saku3 thanks for help!

Hey @kazmsk the change for config looks fine, but in test can we also validate in the test that the error we are getting when creation fails is specifically invalid cap error, and not some other error? Apart from that the PR looks fine.

@kazmsk
Copy link
Contributor Author

kazmsk commented May 12, 2025

Hi @YJDoc2,

I’ve made some additional changes, could you please take a look?

@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 12, 2025

Hey @kazmsk , yes this seems correct. However this fails on runc because the message is not matching - https://github.com/youki-dev/youki/actions/runs/14965059316/job/42053543183?pr=3010

Can we add corresponding error string check as well? (also comment regarding which string check if for which runtime) Thanks :)

@saku3
Copy link
Member

saku3 commented May 12, 2025

to @kazmsk

@kazmsk
Copy link
Contributor Author

kazmsk commented May 14, 2025

Hi @YJDoc2,

Thank you for reviewing it multiple times!

I've updated the implementation to capture the error from the following message:

unexpected error: container stderr was not empty, found : 
time="2025-05-12T11:58:59Z" level=warning msg="ignoring unknown or unavailable capabilities: [TEST_CAP]"
time="2025-05-12T11:58:59Z" level=error msg="runc create failed: unable to start container process: unable to apply caps: operation not permitted"

Currently, if only a warning is detected (and no error), the test is treated as Failed.
However, if treating warning-only cases as Passed is acceptable, I'm happy to update the behavior—please let me know.

Additionally, I’ve added comments indicating which runtime each error message corresponds to, for clarity.

@YJDoc2
Copy link
Collaborator

YJDoc2 commented May 14, 2025

Hey @kazmsk , so I suspect that the unable to apply error in runc might be because kernel support, so we should check for warning only, with the TEST_CAP word in it. Also in the invalid_variant error, can you update the comment to mention it is for youki? Apart from that LGTM.

@kazmsk
Copy link
Contributor Author

kazmsk commented May 14, 2025

Hi @YJDoc2,
I've updated the code to check for the warning message and also modified the comments. Please review again.

Signed-off-by: kazuki.massaki <[email protected]>
Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks for opening the PR and patiently making the changes requests :)

@YJDoc2 YJDoc2 enabled auto-merge (squash) May 14, 2025 05:56
@YJDoc2 YJDoc2 merged commit 2e4e0db into youki-dev:main May 14, 2025
27 checks passed
@github-actions github-actions bot mentioned this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants