Fix risky unwrap(), expect(), and casting#1113
Conversation
Signed-off-by: Qi Luo <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR improves Rust code safety by removing risky operations (unwrap(), expect()) and unnecessary type casting. The changes span both Rust FFI bindings and C API implementations, changing file descriptor types from uint32_t to int32_t to match the underlying C++ int getFd() return type, and replacing Duration parameters with u32 timeout_ms for more direct control.
- Changed file descriptor types from
uint32_ttoint32_tin C API to match underlying C++ implementation - Removed
unwrap()calls when creatingBorrowedFdsince type casting is no longer needed - Replaced
expect()calls in Drop implementations witheprintln!to prevent panics during cleanup - Changed
read_data()API from acceptingDurationtou32 timeout_ms(breaking change) - Removed Clone trait implementation for DbConnector to prevent potential panics
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/c_api_ut.cpp | Updated test file descriptors from uint32_t to int32_t |
| common/c-api/consumerstatetable.h | Changed getFd signature to return int32_t instead of uint32_t |
| common/c-api/consumerstatetable.cpp | Removed numeric_cast from getFd implementation |
| common/c-api/subscriberstatetable.h | Changed getFd signature to return int32_t instead of uint32_t |
| common/c-api/subscriberstatetable.cpp | Removed numeric_cast from getFd implementation |
| common/c-api/zmqconsumerstatetable.h | Changed getFd signature to return int32_t instead of uint32_t |
| common/c-api/zmqconsumerstatetable.cpp | Removed numeric_cast from getFd implementation |
| crates/swss-common/src/types/consumerstatetable.rs | Removed Duration import, unwrap() call, and updated Drop to use eprintln! |
| crates/swss-common/src/types/subscriberstatetable.rs | Removed Duration import, unwrap() call, and updated Drop to use eprintln! |
| crates/swss-common/src/types/zmqconsumerstatetable.rs | Removed Duration import, unwrap() call, updated Drop to use eprintln!, and improved comment capitalization |
| crates/swss-common/src/types/table.rs | Updated Drop implementation to use eprintln! instead of expect() |
| crates/swss-common/src/types/producerstatetable.rs | Updated Drop implementation to use eprintln! instead of expect() |
| crates/swss-common/src/types/zmqclient.rs | Updated Drop implementation to use eprintln! instead of expect() |
| crates/swss-common/src/types/dbconnector.rs | Removed Clone trait implementation and updated Drop to use eprintln! |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
|
||
| pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> Result<SelectResult> { | ||
| let timeout_ms = timeout.as_millis().try_into().unwrap(); | ||
| pub fn read_data(&self, timeout_ms: u32, interrupt_on_signal: bool) -> Result<SelectResult> { |
There was a problem hiding this comment.
The reason that Duration is used is to signify to the caller that this argument is a time argument, and so something time-based should be passed in. C language not having a discrete time type shouldn't be a reason not to use it in Rust code. #Resolved
There was a problem hiding this comment.
I would instead recommend handling a duration that is too large gracefully: return an error saying the duration was too big (or similar).
There was a problem hiding this comment.
resolved. and also change return type.
| unsafe { | ||
| if let Err(e) = swss_try!(SWSSSubscriberStateTable_free(self.ptr)) { | ||
| eprintln!("Error dropping SubscriberStateTable: {}", e); | ||
| } |
There was a problem hiding this comment.
What does it mean for a *_free function to fail? The object got corrupted? I'm not sure if just printing on stderr and continuing the application is safe at that point. #Resolved
There was a problem hiding this comment.
It may be possible that a programmer mistake of throwing in dtor in this class or any base class or any member class. Crashing a critical daemon is not good for online service, so use eprintln as defensive programming programming.
There was a problem hiding this comment.
Running after an exception is thrown in dtor is not good either; the runtime state of the program is no longer known or consistent.
There was a problem hiding this comment.
I plan to fix exception is thrown in dtor issue in another future PR.
| unsafe { | ||
| let fd = swss_try!(p_fd => SWSSConsumerStateTable_getFd(self.ptr, p_fd))?; | ||
| let fd = BorrowedFd::borrow_raw(fd.try_into().unwrap()); | ||
| let fd = BorrowedFd::borrow_raw(fd); |
There was a problem hiding this comment.
borrow_raw document says "The resource pointed to by fd must remain open for the duration of the returned BorrowedFd, and it must not have the value -1.". We should check the value is not negative. #Resolved
There was a problem hiding this comment.
checked for -1, not for negative. Since BorrowedFd and hiredis cares only about -1.
|
|
||
| pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> Result<SelectResult> { | ||
| let timeout_ms = timeout.as_millis().try_into().unwrap(); | ||
| pub fn read_data(&self, timeout_ms: u32, interrupt_on_signal: bool) -> Result<SelectResult> { |
There was a problem hiding this comment.
Instead of changing the argument to u32, you can timeout.as_millis().try_into().map_err(|_| Error::new("Timeout value too large to convert"))? #Resolved
There was a problem hiding this comment.
resolved. and also change return type.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
crates/swss-common/tests/sync.rs
Outdated
| let redis = Redis::start(); | ||
| let pst = ProducerStateTable::new(redis.db_connector(), "table_a")?; | ||
| let cst = ConsumerStateTable::new(redis.db_connector(), "table_a", None, None)?; | ||
| let pst = ProducerStateTable::new(redis.db_connector(), "table_a").unwrap(); |
There was a problem hiding this comment.
Why unwrap() on test code? The old code will already return an error, and the test will be gracefully marked as failed. #Resolved
There was a problem hiding this comment.
The challenge is that read_data, and other function? will return different Result. So it is easy to unwrap and let testcase fail. The failure message is also kind of graceful
thread 'consumer_producer_state_tables_sync_api_basic_test' panicked at crates/swss-common/tests/sync.rs:76:12:
called `Option::unwrap()` on a `None` value
test result: FAILED. 7 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.43s
error: test failed, to rerun pass `-p swss-common --test sync`
There was a problem hiding this comment.
what's the reason to change return type of read_data? Is it really necessary. It seems a lot of churn due to that
There was a problem hiding this comment.
The reason is that we want to use Duration instead of u32, and the casting error handling introduce new Result type. Similar change also happens on get_fd().
pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> std::io::Result<SelectResult>
Another option is to keep both function as thin wrapper of c-api and keep the most native data types, then we can keep the Result type.
There was a problem hiding this comment.
We can keep the Result type, which is
. You can add a constructor to Exception then you can use it in map_err.|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@qiluo-msft , It seems this PR breaks official build. |
This reverts commit d6ccff7.
I think this is because sonic-dash-ha and all rust components are pulling swss-common master. See https://github.com/sonic-net/sonic-dash-ha/blob/b38d8fb68c7e1a1d3c866fdb65e840c7da1021d8/Cargo.toml#L100. sonic-dash-ha has been updated to accommodate the change in swss-common. We need to update its ref in sonic-buildimage. |
I saw the error in sonic-buildimage. The issue is that libswsscommon is built from src/sonic-swss-common mounted by sonic-buildimage but rust is pulled directly from swss-common master branch. libswsscommon lags behind rust swss-common crate and causes the compilation error. We need to update ref of swss-common in sonic-buildimage to the latest. Going forward, probably we should change the way rust components pulling swss-common. Instead of always from the latest of the branch, they can use relative path to find swss-common in sonic-buildimage. When we cut a new branch, we don't need to update cargo.toml to the new branch (otherwise, they always use swss-common master branch). |
|
@qiluo-msft , I submitted PR sonic-net/sonic-buildimage#24663 to fix the issue. And let's discuss the long term fix I proposed above. |
This reverts commit d6ccff7.
Reverts #1113 https://github.com/sonic-net/sonic-swss/blob/eae91a23c01bcc0b1e915ab46c5228a854a6e42e/Cargo.toml#L67 sonic-swss refers to sonic-swss-common's latest commit. So changes in swss-common breaks SONiC's build.
…onic-net#1118) This reverts commit 7682785.
|
Cherry-pick PR to 202511: #1133 |
…1124) This reverts commit 7682785. Following the proper fix sonic-swss (sonic-net/sonic-swss#4028). Now we could bring back the commit. What I did in the PR: Fix several potential crash issues: unwrap() function may crash in runtime. Drop should not crash. Clone should not crash, so remove it. Protect casting Duration
…onic-net#1118) Following the proper fix sonic-swss (sonic-net/sonic-swss#4028). Now we could bring back the commit. What I did in the PR: Fix several potential crash issues: 1. unwrap() function may crash in runtime. 2. Drop should not crash. 3. Clone should not crash, so remove it. 4. Protect casting Duration
…1136) Following the proper fix sonic-swss (sonic-net/sonic-swss#4028). Now we could bring back the commit. What I did in the PR: Fix several potential crash issues: 1. unwrap() function may crash in runtime. 2. Drop should not crash. 3. Clone should not crash, so remove it. 4. Protect casting Duration Co-authored-by: Vineet Mittal <[email protected]>
unwrap() function may crash in runtime.
Drop should not crash.
Clone should not crash, so remove it.
Protect casting Duration
Depends on sonic-net/sonic-dash-ha#132