[Fix 20284]: Enhance smartswitch environment variables parsing#21209
[Fix 20284]: Enhance smartswitch environment variables parsing#21209lguohan merged 1 commit intosonic-net:masterfrom
Conversation
Signed-off-by: Ze Gan <ganze718@gmail.com>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw ms_conflict |
|
/azpw ms_conflict |
|
/azpw ms_conflict |
| env_vars["IS_DPU_DEVICE"] = (smart_switch_dpu ? "true" : "false"); | ||
| env_vars["NUM_DPU"] = std::to_string(num_dpus); | ||
|
|
||
| for (const auto& [key, value] : env_vars) { |
There was a problem hiding this comment.
I think services files such as database, swss, syncd are enabled at build time.
Are we certain that, these will be started only after systemd-sonic-generator is run atleast once?
There was a problem hiding this comment.
Please also verify the changes on smartswitch platform and make sure multiple database instances are created
There was a problem hiding this comment.
Thanks for your comments.
Yes, the generator will be run only once and before all services started.
I did confirm the smartswitch scenario and pasted the results on the PR description.
|
|
||
| std::unordered_map<std::string, std::string> env_vars; | ||
| env_vars["IS_DPU_DEVICE"] = (smart_switch_dpu ? "true" : "false"); | ||
| env_vars["NUM_DPU"] = std::to_string(num_dpus); |
There was a problem hiding this comment.
This is a good approach to solve this issue but i see one setback. If we need to add new env variables, ssg code should be updated. which IMO is not very flexible. I have an idea for generic solution, let me know what you think
- Add a oneshot service very early in the boot. Read static env variables (Eg: $PLATFORM, $NUM_ASIC, $NUM_DPU, $SONIC_BOOT_TYPE etc) and write them to a common file
/etc/sonic/static-env-variables - We can leverage ssg to write
EnvironmentFile=/etc/sonic/static-env-variablesoption to all the services. making the ssg code minimal and flexible. - We can potentially clean a lot of code under docker_image_ctl with this approach
There was a problem hiding this comment.
Good idea, I can have a try.
There was a problem hiding this comment.
I tried it, but I feel it wasn't easy yet.
- The shell of oneshot service you mentioned might look like the following. But at this time, the database service hasn't ready, so the sonic-cfggen would not work and I have to parse the
/host/machine.confas same as what systemd-sonic-generator did if I would like to get the variable. The SSG has done this by an efficient function, C function, why should I do it again by a shell?
SYSTEMD_ENV_FILE="/etc/sonic/static_env"
# Load platform from sonic-cfggen
PLATFORM=${PLATFORM:-`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`}
echo "PLATFORM='${PLATFORM}'" >> "$SYSTEMD_ENV_FILE"
# Parse environment from platform.json
PLATFORM_JSON=/usr/share/sonic/device/$PLATFORM/platform.json
if [ -f "$PLATFORM_JSON" ]; then
# Environment variables for Smart Switch
NUM_DPU=$(jq -r '.DPUS | length' $PLATFORM_JSON 2>/dev/null)
if [[ -z "$NUM_DPU" ]]; then
NUM_DPU=0
fi
jq -e '.DPU' $PLATFORM_JSON >/dev/null
if [[ $? -eq 0 ]]; then
IS_DPU_DEVICE="true"
else
IS_DPU_DEVICE="false"
fi
echo "NUM_DPU='${NUM_DPU}'" >> "$SYSTEMD_ENV_FILE"
echo "IS_DPU_DEVICE='${IS_DPU_DEVICE}'" >> "$SYSTEMD_ENV_FILE"
fi
- As your proposal, if we want to add new env variables, we have to update the shell script. I don't see any difference or challenge in doing this via SSG. If you feel that C code isn't flexible, I have to say we had an old SSG with python code previously, But we discarded it and rewrote it via C due to the efficiency issues.
There was a problem hiding this comment.
Yes, script couldn't use DB just like SSG
Hmm, difference is we need to run the script once. Since the oneshot service runs before other services are even started, CPU should be fairly free and should be executed quickly. we do need to benchmark this solution to measure impact.
Advantage being load on SSG is less, all it does it to add EnvironmentFIle= and with some optimization we don't even need to edit and write to the .service file after the first
There was a problem hiding this comment.
In any case, i'm okay with the current solution. We might move to the generic oneshot service if required in the future
There was a problem hiding this comment.
If this PR looks good to you, could you please help to approve it.
| env_vars["NUM_DPU"] = std::to_string(num_dpus); | ||
|
|
||
| for (const auto& [key, value] : env_vars) { | ||
| tmp_file << "Environment=\"" << key << "=" << value << "\"" << std::endl; |
There was a problem hiding this comment.
Do we need to add Environment= option if the file doesn't have[Service] section in the file
There was a problem hiding this comment.
I worry that the Service section might be introduced by other functions or modules in the future. And I don't see any side-effects if I define some environment variables at this point.
There was a problem hiding this comment.
Please make sure, systemd won't throw an error if we add Environment= values after the last section without [Service]
|
Hi @lguohan , Please help to review or merge this PR. |
|
@Pterosaur , can you put delay numbers in the description? |
Fix sonic-net#20284 In 202405 and above, two extra steps are added before the start of every container which checks NUM_DPU and IS_DPU_DEVICE by parsing the platform.json file using the jq tool. This is only relevant for Smartswitch. However, this is adding some delay during the reconciliation phase of WR/FR resulting How I did it Set the environment variables for systemd by systemd-sonic-generator. Signed-off-by: Ze Gan <ganze718@gmail.com>

Why I did it
Fix #20284
In 202405 and above, two extra steps are added before the start of every container which checks NUM_DPU and IS_DPU_DEVICE by parsing the platform.json file using the jq tool. This is only relevant for Smartswitch. However, this is adding some delay during the reconciliation phase of WR/FR resulting
When there is load on CPU, both the jq calls are adding > 1 sec to the start of swss and almost 0.5 sec to the start of syncd. There are also present in teamd and bgp container start flow which may cause extra contention on the CPU.
Work item tracking
How I did it
Set the environment variables for systemd by systemd-sonic-generator.
How to verify it
jqcommand under the swss.sh start and syncd.sh start from the sonic-bootchartWhich release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)