Implement audio video synchronization by nanangizz · Pull Request #4325 · pjsip/pjproject

nanangizz · 2025-02-26T09:00:48Z

How to use

For app using PJSUA-LIB

By default it is enabled for audio & video calls. It can be disabled per call basis by setting PJSUA_CALL_NO_MEDIA_SYNC in pjsua_call_setting.flag.

For app using PJMEDIA

App creates AV sync using pjmedia_av_sync_create().
App adds all media to be synchronized using pjmedia_av_sync_add_media(). If the media is a audio/video stream, use pjmedia_stream_common_set_avsync() instead.
Each time a media receives an RTCP-SR, update AV sync usingpjmedia_av_sync_update_ref().
Each time a media returns a frame to be rendered, e.g: via port.get_frame(), update AV sync using pjmedia_av_sync_update_pts(), the function may request delay adjustment: increase or decrease delay.
Remove media from AV sync using pjmedia_av_sync_del_media().
Destroy the AV sync using pjmedia_av_sync_destroy().

For app using PJMEDIA AVI File Player

By default it is enabled. It can be disabled by setting PJMEDIA_AVI_FILE_NO_SYNC flag in creating AVI file player.

Global macro settings

Configurable via config_site.h.

/**
 * Maximum tolerable presentation lag from the earliest to the latest media,
 * in milliseconds, in inter-media synchronization. When the delay is
 * higher than this setting, the media synchronizer will request the slower
 * media to speed up. And if after a number of speed up requests the delay
 * is still beyond this setting, the fastest media will be requested to
 * slow down.
 *
 * Default: 45 ms
 */
#ifndef PJMEDIA_AVSYNC_MAX_TOLERABLE_LAG_MSEC
#   define PJMEDIA_AVSYNC_MAX_TOLERABLE_LAG_MSEC    45
#endif


/**
  * Maximum number of speed up request to synchronize presentation time,
  * before a slow down request to the fastest media is issued.
  *
  * Default: 10
  */
#ifndef PJMEDIA_AVSYNC_MAX_SPEEDUP_REQ_CNT
#   define PJMEDIA_AVSYNC_MAX_SPEEDUP_REQ_CNT       10
#endif

Logic in `pjmedia_av_sync_update_pts()`

Calculate the absolute timestamp (or NTP timestamp) of the frame playback based on the reference NTP+RTP timestamp from RTCP-SR packet and the frame RTP timestamp from RTP packet. This timestamp is usually called presentation time (PTS).
If the PTS is the largest, store it as AV sync's maximum PTS. Otherwise, calculate the difference or the delay of this PTS to the maximum PTS, then request the media to speed up as much as the delay.
If after a specific number of speed up requests (i.e: configurable via PJMEDIA_AVSYNC_MAX_SPEEDUP_REQ_CNT) and the lag is still beyond a tolerable value (i.e: configurable via PJMEDIA_AVSYNC_MAX_TOLERABLE_LAG_MSEC), the function will issue slow down request to the fastest media (which set the AV sync's maximum PTS).
To prevent unoptimized delay applied to all media for the synchronization, a mechanism is implemented where slow-down requests are marked down and speed-up requests are marked up.

sauwming · 2025-02-26T09:23:23Z

An early comment since it's still a draft.
The term 'av' seems to imply the sync will only apply to audio and video only, even though it looks like the description can cover general cases.
Especially with the upcoming text media, which may also need to be sync-ed (although the sync may not be available in the early version, but perhaps in the future).

Perhaps removing the av and call it pjmedia_sync should be sufficient?

nanangizz · 2025-02-26T10:11:01Z

An early comment since it's still a draft.

Sure, actually early feedbacks is one of the goal of creating this draft.

The term 'av' seems to imply the sync will only apply to audio and video only, even though it looks like the description can cover general cases. Especially with the upcoming text media, which may also need to be sync-ed (although the sync may not be available in the early version, but perhaps in the future).

Perhaps removing the av and call it pjmedia_sync should be sufficient?

Yes, the sync module is generic for any media, especially delivered via RTP/RTCP and using a shared/synchronized NTP source.

IMO the 'av_sync' prefix will enhance intuitiveness, i.e: it sounds related to presentation time synchronization. While 'sync' prefix may be somewhat ambiguous, e.g: what to sync?. I recall how 'ssl_sock' term was chosen over tls_sock for catchy/intuitiveness reason, when it was being developed, TLS was already around replacing SSL (even PJSIP transport was already using 'TLS', see also #957). What do you think?

sauwming · 2025-02-26T10:14:18Z

The term should be inter-media synchronization, but yes, it's not quite as catchy, so I suppose it's okay if the 'av' stays.

The jitter buffer is designed for audio, while in video it is just a normal/dummy buffer. Hence managing delay in audio stream is much easier than in video stream. As normally video presentation is later than audio, it should be fine to adjust the delay in audio stream only for the synchronization.

wosrediinanatour · 2025-03-05T06:32:03Z

pjmedia/src/pjmedia/av_sync.c

        ms_diff = ntp_to_ms(&ntp_diff);

+        /* Smoothen and round down the delay */
+        ms_diff = ((ms_diff + 19 * media->smooth_diff) / 200) * 10;


Should an upper bound considered? Where does the magic numbers come from?

Just added the upper bound 60s in the last commit, actually not really sure if it is needed.

The magic numbers are weighting for the smoothing, i.e: last delay uses weight of 19, newly received delay uses weight of 1, then divide the sum by 20 for the new delay. Instead of by 20, it is by 200 then multiplied to 10 for rounding down to nearest 10.

Thanks for the feedback.

nanangizz · 2025-03-05T08:13:46Z

Ready to review perhaps :)

Currently the delay adjustment is implemented in audio stream only, because:

video playback is kind of play as soon as possible (with built-in minimum delay for stability), the delay/burst/optimal-delay calculations in jitter buffer do not really apply (due to multiple RTP packets per video frame)
it is easier to manage delay in audio as the jitter buffer already have info about delay & optimal delay
should be sufficient for synchronization.

andreas-wehrmann · 2025-03-05T13:52:00Z

Especially with the upcoming text media, which may also need to be sync-ed (although the sync may not be available in the early version, but perhaps in the future).

Sorry, quick question about that: Are there any specific plans to implement this (more specifically: RTT with T.140)?
I'm asking because I will need this later (probably this year) and was thinking about implementing it myself and maybe bring this upstream, unless there are any efforts underway already which I could support?

sauwming · 2025-03-05T14:20:43Z

Yes, I'm currently implementing it. The basic functionality is already working (can send and receive text), but there's still a lot more to do, such as redundancy, docs, etc.

andreas-wehrmann · 2025-03-05T15:14:59Z

Yes, I'm currently implementing it.

If this work available publicly? From a quick glance I couldn't find a proper branch.
I'd like to take a look at it if that's alright with you (even if it's in its very early stages).

sauwming · 2025-03-05T23:52:52Z

Not yet public. Will create a branch early next week.

sauwming

CR first stage, I haven't checked av_sync.c

pjmedia/include/pjmedia/av_sync.h

sauwming · 2025-03-06T02:52:21Z

pjsip/src/pjsua-lib/pjsua_media.c

              sizeof(call->media_prov[0]) * call->med_prov_cnt);

+    /* Create synchronizer */
+    if ((maudcnt+mvidcnt) > 1 && !call->av_sync) {


Do we need pjsua config to enable/disable the sync, as well as to choose which media will be sync?

Yes, enable/disable should be needed. As this is a new feature, I guess we can postpone fine-tuning settings such as selecting which media to sync, allowing two or more synchronizers in a call, we can add the settings later based on user feedback?

The thing is, I'm not sure if text stream should be synchronized by default, so that's why I proposed the setting to let the user decide.

Having said that, I'm not sure whether we should also implement text sync, so I suppose enable/disable is sufficient for now, with doc noting that it currently only applies for audio and video.

In the future, text sync may be useful for streaming a video with subtitle (may require something like text-player, similar to wav/video-player). However, for now I think having text sync is not so urgent.

Just to add a sample scenario of two or more synchronizers in a call, in a conference call, there can be two sets of audio+video stream (so totally there will be 4 streams), rather than synchronize all 4 streams using one synchronizer, perhaps it is better to employ a separate synchronizer for each set (due to different original sources, they may have different codec delay, network delay, NTP source, etc). So in the future, when designing fine tune settings per media, perhaps we need to also consider this scenario.

Also updated docs.

pjmedia/src/pjmedia/av_sync.c

sauwming · 2025-03-06T07:52:46Z

Currently the delay adjustment is implemented in audio stream only, because:

video playback is kind of play as soon as possible (with built-in minimum delay for stability), the delay/burst/optimal-delay calculations in jitter buffer do not really apply (due to multiple RTP packets per video frame)

it is easier to manage delay in audio as the jitter buffer already have info about delay & optimal delay

should be sufficient for synchronization.

Actually I was thinking the other way around. Because while it's true that video is initially played asap without buffering/prefetching, eventually it will be video that will be lagging due to its high bandwidth, and overall processing power required.
And when it happens (the video lags behind), then I imagine the mechanism would be to start dropping frames.

So perhaps it should be tested in such scenario, for example using HD video and/or minimal connection speed such as mobile data.

nanangizz · 2025-03-06T08:23:00Z

Currently the delay adjustment is implemented in audio stream only, because:

video playback is kind of play as soon as possible (with built-in minimum delay for stability), the delay/burst/optimal-delay calculations in jitter buffer do not really apply (due to multiple RTP packets per video frame)

it is easier to manage delay in audio as the jitter buffer already have info about delay & optimal delay

should be sufficient for synchronization.

Actually I was thinking the other way around. Because while it's true that video is initially played asap without buffering/prefetching, eventually it will be video that will be lagging due to its high bandwidth, and overall processing power required. And when it happens (the video lags behind), then I imagine the mechanism would be to start dropping frames.

So perhaps it should be tested in such scenario, for example using HD video and/or minimal connection speed such as mobile data.

Actually the description above is a bit misleading, it was supposed to explain that video stream does not have delay management and implementing it is not so simple for now because the jitter buffer does not really help. It needs to be implemented in the video stream itself from scratch. In audio, the optimal delay calculation is already done by jbuf and increasing delay can be easily inserted into jbuf.

Yes, in real world the video tends to lag behind the audio (from codecs & delivery aspects). That's why it is sufficient to have delay adjustment implemented only in audio, as mostly we need to add delay to audio (from its optimal delay).

Btw, adding delay to video is perhaps not so complex or risky, will try to implement it.

Also add safety net to ignore any excessive delay request from avsync (i.e: > 5 seconds).

sauwming · 2025-03-17T00:53:58Z

I'm not sure whether the video "delay" adjustment is completed, but it seems that the current patch can only add/decrease delay to the video? I believe a more realistic usage would be to speed up the video, i.e. by dropping or skipping frames.

In other words, instead of adding delay to audio, which will increase the latency/lag of the entire av streams, there can be another option for the user, which is to speed up the video, so the streams still feel like real time.

Also move speed-up markup by 4/3 to the synchronizer (was in each stream)

nanangizz · 2025-03-17T06:50:16Z

I'm not sure whether the video "delay" adjustment is completed, but it seems that the current patch can only add/decrease delay to the video? I believe a more realistic usage would be to speed up the video, i.e. by dropping or skipping frames.

In other words, instead of adding delay to audio, which will increase the latency/lag of the entire av streams, there can be another option for the user, which is to speed up the video, so the streams still feel like real time.

Previously I assume that video decoding is always done as fast as possible, no room for decreasing the delay. However, after implementing your idea here, the log showed some frames being skipped.

nanangizz · 2025-03-17T08:05:19Z

Next, will try to integrate the AV sync to AVI player in the source side (for streaming & local playback). Currently the exising AVI player synchronization seems to be done in the rendering side (for local playback only?).

sauwming · 2025-03-17T08:22:12Z

Right, currently sync is only done in aviplay.

Copilot

Copilot reviewed 19 out of 24 changed files in this pull request and generated 1 comment.

Files not reviewed (5)

pjmedia/build/Makefile: Language not supported
pjmedia/build/pjmedia.vcproj: Language not supported
pjmedia/build/pjmedia.vcxproj: Language not supported
pjmedia/build/pjmedia.vcxproj.filters: Language not supported
pjsip-apps/src/swig/symbols.i: Language not supported

Copilot · 2025-04-10T09:08:56Z

pjmedia/src/pjmedia/avi_player.c

            pjmedia_port_destroy(&fport[i]->base);
    }
+
+    if (*p_streams && (*p_streams)->avsync) {


In the cleanup block where pjmedia_av_sync_del_media is called, a NULL is passed as the synchronizer parameter. To ensure proper removal of media from the synchronizer, pass the actual AV sync instance (e.g., (*p_streams)->avsync) instead of NULL.

Initial work on av sync

9a36199

nanangizz added component: pjmedia type: enhancement component: pjsua-lib labels Feb 26, 2025

Add av_sync.c to VS2022 project & GNU Makefile

ac3a298

nanangizz mentioned this pull request Mar 3, 2025

Add dedicated-thread method for video sending rate control to improve video latency #4332

Merged

nanangizz added 2 commits March 4, 2025 14:40

Merge branch 'master' into avsync

e95e972

wosrediinanatour reviewed Mar 5, 2025

View reviewed changes

Clean ups, update docs

cc5b8a3

nanangizz marked this pull request as ready for review March 5, 2025 08:13

nanangizz requested review from bennylp, sauwming and trengginas March 5, 2025 08:14

sauwming reviewed Mar 6, 2025

View reviewed changes

nanangizz added 3 commits March 6, 2025 10:22

Update logging & tracing

22415d0

Update based on comments.

891c65b

Update audio stream get_frame_ext()

8aca075

Also updated docs.

sauwming reviewed Mar 6, 2025

View reviewed changes

pjmedia/src/pjmedia/av_sync.c Outdated Show resolved Hide resolved

Merge branch 'master' into avsync

fd3ab06

nanangizz added 7 commits March 7, 2025 13:33

Add simple delay management in video stream

56fe5a0

Also add safety net to ignore any excessive delay request from avsync (i.e: > 5 seconds).

Update pjmedia_av_sync_update_pts() docs

3fe71a4

Handle avsync cancellation in mid call

2e374ae

Update config.h

098db88

Merge branch 'master' into avsync

12fd9a8

Merge branch 'master' into avsync

8a7ec1b

Fix discard calculation in jbuf

3f45ae8

nanangizz added 2 commits March 17, 2025 11:32

Merge branch 'master' into avsync

af137f1

Update video stream to try to skip frames on speed up request

b501049

Also move speed-up markup by 4/3 to the synchronizer (was in each stream)

sauwming mentioned this pull request Mar 18, 2025

Android, Video / Audio synchronisation ("lip sync") issues #4359

Closed

nanangizz added 2 commits March 19, 2025 11:57

Integrate synchronization into AVI streams

cf6547d

Fix error compile

3874803

sauwming approved these changes Mar 26, 2025

View reviewed changes

Update VS2005 projects & SWIG symbols.i

649ef8f

nanangizz merged commit 078bfea into master Mar 27, 2025
42 checks passed

nanangizz deleted the avsync branch March 27, 2025 08:28

nanangizz requested a review from Copilot April 10, 2025 09:08

Copilot AI reviewed Apr 10, 2025

View reviewed changes

aol-nnov mentioned this pull request Apr 15, 2025

Do not erase decoded video frame in on_rx_rtp #4398

Closed

nanangizz added this to the release-2.16 milestone Apr 16, 2025

BarryYin pushed a commit to BarryYin/pjproject that referenced this pull request Feb 3, 2026

Implement audio video synchronization (pjsip#4325)

3ddaed6

Conversation

nanangizz commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use

For app using PJSUA-LIB

For app using PJMEDIA

For app using PJMEDIA AVI File Player

Global macro settings

Logic in pjmedia_av_sync_update_pts()

Uh oh!

sauwming commented Feb 26, 2025

Uh oh!

nanangizz commented Feb 26, 2025

Uh oh!

sauwming commented Feb 26, 2025

Uh oh!

wosrediinanatour Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

nanangizz Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

nanangizz commented Mar 5, 2025

Uh oh!

andreas-wehrmann commented Mar 5, 2025

Uh oh!

sauwming commented Mar 5, 2025

Uh oh!

andreas-wehrmann commented Mar 5, 2025

Uh oh!

sauwming commented Mar 5, 2025

Uh oh!

sauwming left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sauwming Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

nanangizz Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sauwming Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

nanangizz Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sauwming commented Mar 6, 2025

Uh oh!

nanangizz commented Mar 6, 2025

Uh oh!

sauwming commented Mar 17, 2025

Uh oh!

nanangizz commented Mar 17, 2025

Uh oh!

nanangizz commented Mar 17, 2025

Uh oh!

sauwming commented Mar 17, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nanangizz commented Feb 26, 2025 •

edited

Loading

Logic in `pjmedia_av_sync_update_pts()`

nanangizz Mar 6, 2025 •

edited

Loading