[cherry-pick] [202205] Fix memory leak issue in ConfigDBConnector.#706
Conversation
Fix memory leak issue in ConfigDBConnector: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer: ``` void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on) { m_db_name = db_name; m_key_separator = m_table_name_separator = get_db_separator(db_name); SonicV2Connector_Native::connect(m_db_name, retry_on); if (wait_for_init) { auto& client = get_redis_client(m_db_name); auto pubsub = client.pubsub(); <== this pointer not delete later. ``` Also change DBConnector::pubsub() to deprecated for none SWIG scenario. Change DBConnector::pubsub() to return a smart pointer. Pass all test case. Run following code in python and validate there is no epoll and socket leak: ``` import gc from swsscommon import swsscommon config_db = swsscommon.ConfigDBConnector_Native() config_db.connect() config_db.connect() config_db.connect() gc.collect() ``` <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
PR been blocked by this UT, however the issue seems not related with the code change in this PR: 2022-11-10T15:51:38.7247858Z test_vlan.py::TestVlan::test_VlanMemberLinkDown FAILED [ 91%] 2022-11-09T04:43:35.3530022Z test_vlan.py::TestVlan::test_VlanMemberLinkDown FAILED [ 91%] |
|
My another PR on 202205 also have failed on same UT few times: |
|
Will create another test PR to validate the UT issue not caused by code change in this PR: |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@qiluo-msft , could you please help do a force merge? And I confirm the UT issue not caused by change in this PR with another PR which change nothing: Validation also failed on same UT |
|
The UT issue caused by a UT added by this PR: sonic-net/sonic-swss#2469 Seems the UT depends on a kernel patch which not ready on 202205: |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870
The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:
Also change DBConnector::pubsub() to deprecated for none SWIG scenario.
Change DBConnector::pubsub() to return a smart pointer.
Pass all test case.
Run following code in python and validate there is no epoll and socket leak:
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870
Co-authored-by: liuh-80 azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net