Skip to content

Transaction state not propagated properly to on_call_tsx_state() cb#3614

Merged
sauwming merged 10 commits intomasterfrom
call-tsx-trying
Jul 7, 2023
Merged

Transaction state not propagated properly to on_call_tsx_state() cb#3614
sauwming merged 10 commits intomasterfrom
call-tsx-trying

Conversation

@sauwming
Copy link
Copy Markdown
Member

@sauwming sauwming commented Jun 28, 2023

Potential backward incompatibility note:

  • Applies to application that implements sip_inv on_tsx_state_changed() in pjsip level, on_call_tsx_state() in pjsua-lib, or Call::onCallTsxState() in pjsua2.
  • Only affects incoming/UAS transactions.
  • Application needs to make sure that it's ready to receive UAS tsx state PJSIP_TSX_STATE_TRYING in the callback (previously, this event might not be reported).
  • Application now receives the callback in the correct order, i.e. e->body.tsx_state.type == PJSIP_EVENT_RX_MSG first for the request, followed by PJSIP_EVENT_TX_MSG for the response. Previously, the order was reversed. Thus, app that relies on the callback ordering needs to make sure that its implementation still works as expected.

PR Description:
It is reported In #3605 that transaction state event may not be properly called for certain requests within a call, such as re-INVITE and BYE.

This is because in sip_inv.c mod_inv_on_tsx_state(), we call on_tsx_state_changed() after calling the state handler, which can potentially cause a few issues:

  • lost tsx state PJSIP_TSX_STATE_TRYING. This is because the state handler will send response to the request and then modify the tsx state. So, by the time we call the callback, the state has changed.
  • duplicate callback called. After the state handler modifies the tsx state, it calls the callback, and then we call the callback again with the same tsx state. So PJSIP_TSX_STATE_COMPLETED may be reported twice.
Processing incoming message: Request msg BYE/cseq=7289 (rdata0x62500007f928)
Response msg 200/BYE/cseq=7289 (tdta0x6210001465a8) created
... onCallTsxState() PJSIP_TSX_STATE_COMPLETED
... onCallTsxState() PJSIP_TSX_STATE_COMPLETED
  • wrong callback order. The callback for the response will be sent first, and then comes the callback for the request itself.
... onCallTsxState() called with TX_MSG: 100 Trying ...
on_rx_reinvite() returns 0
... onCallTsxState() called with RX_MSG: SIP INVITE !...

Bug fix:
This PR also seems to fix TRICKLE ICE behavior upon receiving PRACK:
In pjsua_call.c pjsua_call_on_tsx_state_changed():

    } else if (tsx->role == PJSIP_ROLE_UAS &&
               pjsip_method_cmp(&tsx->method, pjsip_get_prack_method())==0 &&
               tsx->state==PJSIP_TSX_STATE_TRYING)
    {
        /* Trickle ICE tasks:
         * - UAS receiving PRACK, start trickling
         */
        pjsua_ice_check_start_trickling(call, PJ_FALSE, e);
    }

Without the patch, the state TRYING seems to be lost and trickle ICE never starts:

16:10:17.398           pjsua_core.c  .RX 568 bytes Request msg PRACK/cseq=29901 (rdata0x623000017008) from TCP 139.162.62.29:5060:
16:10:17.399           pjsua_core.c  .......TX 506 bytes Response msg 200/PRACK/cseq=29901 (tdta0x6210000ce5a8) to TCP 139.162.62.29:5060:
pjsua_call_on_tsx_state_changed Completed
pjsua_call_on_tsx_state_changed Completed
16:10:18.285           pjsua_call.c  Call 0: ICE trickle retransmitting 18x (retrans #2)

With the patch:

16:23:55.762           pjsua_core.c  .RX 564 bytes Request msg PRACK/cseq=30781 (rdata0x106813808) from TCP 139.162.62.29:5060:
pjsua_call_on_tsx_state_changed Trying
16:06:35.221           pjsua_call.c  ......Call 0: ICE trickle stop retransmitting 18x after receiving PRACK
16:06:35.221           pjsua_call.c  ......Call 0: ICE trickle start trickling

@wosrediinanatour
Copy link
Copy Markdown
Contributor

wosrediinanatour commented Jun 28, 2023

Most looks good so far, when applying the patch on 2.13.

Note that when receiving SIP BYE:

Received Request msg BYE/cseq=25997 (rdata0x625000073128)
Transaction created for Request msg BYE/cseq=25997 (rdata0x625000073128)
Transaction tsx0x61d00003b728 state changed to Calling
Incoming Request msg BYE/cseq=25997 (rdata0x625000073128) in state Null
 State changed from Null to Trying, event=RX_MSG
 Transaction tsx0x61d000046b28 state changed to Trying
.. callback called with TRYING, RX_MSG...
 Response msg 200/BYE/cseq=25997 (tdta0x6210001465a8) created
Sending Response msg 200/BYE/cseq=25997 (tdta0x6210001465a8)
Sending Response msg 200/BYE/cseq=25997 (tdta0x6210001465a8) in state Trying 
... BYE sent...
State changed from Trying to Completed, event=TX_MSG
Transaction tsx0x61d000046b28 state changed to Completed
...callback called with COMPLETED, TX_MSG
Call 3: deinitializing media.. 
 Media stream call03:0 is destroyed
...callback called with COMPLETED, RX_MSG

The last one is not necessary (maybe this is different in the master branch?). And after the call object is destroyed and PJSUA2 is shutting down, I get the following:

Destroying...
Request to terminate transaction
State changed from Completed to Terminated, event=USER
Transaction tsx0x61d000046b28 state changed to Terminated
Destroying txdata Response msg 200/BYE/cseq=25997 (tdta0x6210001465a8)
Transaction destroyed!

Is this OK or a "leak"?

@sauwming
Copy link
Copy Markdown
Member Author

Thanks for the feedback. I have fixed the issue in the latest commit:

As for the leak, I need more info. Is the transaction still ongoing? (i.e. is the BYE sending still in progress?) If yes, then it's okay, since you may have shutdown the library prior to the termination. But if the transaction should actually have completed&terminated prior to the shutdown but for some reason, it did not, then it's a leak. Perhaps you can attach a complete log so we can check. Also, it's recommended to put the timestamp in the log so we can tell how long each step took place.

Copy link
Copy Markdown
Member

@nanangizz nanangizz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this change introduce a rather major behavior change for existing apps (e.g: IIRC some users reported/complained about wrong order of tsx-change callbacks, so they may have applied some special "tricks" to deal with this, but now this is "fixed" and their tricks may no longer work or even harmful), perhaps we should put a 'big' note and/or add this into milestone's backward-incompatibility-changes.

Also, please make sure this passes unit-tests and Python/pjsua tests.

@sauwming sauwming added the note:api-break Causing backward incompatibility label Jun 30, 2023
@sauwming
Copy link
Copy Markdown
Member Author

As this change introduce a rather major behavior change for existing apps (e.g: IIRC some users reported/complained about wrong order of tsx-change callbacks, so they may have applied some special "tricks" to deal with this, but now this is "fixed" and their tricks may no longer work or even harmful), perhaps we should put a 'big' note and/or add this into milestone's backward-incompatibility-changes.

Okay, already put a note above.
Checking our own callback pjsua_call_on_tsx_state_changed(), it seems to be safe. In fact, I believe it even fixes the trickle ICE behavior upon receiving PRACK (also added this in the PR desc):

    } else if (tsx->role == PJSIP_ROLE_UAS &&
               pjsip_method_cmp(&tsx->method, pjsip_get_prack_method())==0 &&
               tsx->state==PJSIP_TSX_STATE_TRYING)
    {
        /* Trickle ICE tasks:
         * - UAS receiving PRACK, start trickling
         */
        pjsua_ice_check_start_trickling(call, PJ_FALSE, e);
    }

Also, please make sure this passes unit-tests and Python/pjsua tests.

Okay, will wait for #3612 before merging.

@wosrediinanatour
Copy link
Copy Markdown
Contributor

Thanks for the feedback. I have fixed the issue in the latest commit:

Works!

As for the leak, I need more info. Is the transaction still ongoing?

Might be an issue of my testing framework. I need to do more testing (shall it be part of this PR, or would it be better to have it in a new issue?).

@sauwming
Copy link
Copy Markdown
Member Author

As for the leak, I need more info. Is the transaction still ongoing?
Might be an issue of my testing framework. I need to do more testing (shall it be part of this PR, or would it be better to have it in a new issue?).

If it's not related, better put it separate.

- On windows, error EACCESS when binding/opening socket
- Fix UnicodeEncodeError/UnicodeDecodeError
@sauwming sauwming linked an issue Jul 5, 2023 that may be closed by this pull request
@sauwming sauwming merged commit bb09b94 into master Jul 7, 2023
@sauwming sauwming deleted the call-tsx-trying branch July 7, 2023 02:01
BarryYin pushed a commit to BarryYin/pjproject that referenced this pull request Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: pjsip note:api-break Causing backward incompatibility type: bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PJSUA2: inconsistence and missing calls of method onCallTsxState

4 participants