Refactor test apps to use unit-test framework#4014
Conversation
…e new framework (reason: because parallel flag is not set, doh!)
…, skip essential tests, list tests
…ce returning value
…use of errors when running on Windows virtual machine
…t wait the previous test to complete)
…ing, and running unit test. This can be used by all test apps. pjlib-util-test has been ported to use this utilities
…sential test because it does not exist in features test (in pjlib-test)
…up from 45m originally to 15m using 10 worker threads
…4:30 minutes with 10 worker threads, from 45:42m originally)
…provements due to exclusive tests
… automatic error reporting) hopefully make it easier to use
…e with unit-test logging (see unittest.md)
…args are set in GitHub action variables
… into unittest-framework
|
I think this is done. TODO probably as follow up PRs:
|
|
For CI Windows, should we also add 64-bit version? |
Rather than double the number of tests, perhaps we can just make 64-bit as the default for testing (i.e. make every test config build for win64) instead? Since most machines should be 64-bit nowadays. |
|
Wholeheartedly agree on Win64 (I already wrote it as todo in my comment above). Unfortunately the pjsip/third_party_libs only has x86 as the target for most libs (some do have x64 as target). So rather than just building and testing basic exe, I was thinking to do the work as separate PR to create more elaborate CI. |
|
Win CI sometimes failed this week (it was very stable last week):
Will keep watching for these issues. |
* Initial work on unittest framework, tested * Finished reorganizing pjlib-test to use unit-test framework * Fix big problem where performance improvement is not observed with the new framework (reason: because parallel flag is not set, doh!) * Tidying up global vars in pjlib-test. Add test options: stop on error, skip essential tests, list tests * Change the PJ_TEST_XX() signature to be more generic and does not force returning value * Modifications to some existing tests to use unit-test test macros * Updated VS projects with argparse.h and unittest.h * Fix warnings on Windows * Disable parallel unit-testing for ioqueue stress test on Windows because of errors when running on Windows virtual machine * Non-parallel test case will now run exclusively (previously it did not wait the previous test to complete) * Add pjlib-test/test_util.h for common utilities for parsing, configuring, and running unit test. This can be used by all test apps. pjlib-util-test has been ported to use this utilities * Dirty hack to fix error message being displayed when user selected essential test because it does not exist in features test (in pjlib-test) * Replace PJ_TEST_PARALLEL with PJ_TEST_EXCLUSIVE * Ported pjnath-test to use unit-testing framework, with limited speed-up from 45m originally to 15m using 10 worker threads * Large modifications in pjnath-test to speed-up test. It is fast now (4:30 minutes with 10 worker threads, from 45:42m originally) * Ported pjmedia-test to use unit-test * Porting of pjsip test to unit testing framework. Not much of speed improvements due to exclusive tests * Modifications in pj_argparse API (changed get() to get_bool() and add automatic error reporting) hopefully make it easier to use * Refactor tsx_uas_test() to allow parallel testing * Parallelize tsx_uac_test() * Further effort to parallize tests in pjsip-test. Discoverd major issue with unit-test logging (see unittest.md) * Showing PJLIB config is optional with cmd line option * Bug fixing failed pjsip tests when running in parallel mode * Finished paralleizing all pjsip tests except one * Continuing correcting errors * Add test shuffle feature * Modify CI workflows to use standard arguments: -w 3 --shuffle. These args are set in GitHub action variables * Updated due to change in argparse API signature (swap arg order) * Removed unittest.md (it was draft for the PR) * Fix the use of GitHub repository vars * Tidying up minor conflicting merge in VC projects and pjlib.h * Attempt to fix assertion failure when tdata is being accessed after reference counter goes to zero in regc_test * Fixing failure in transport_loop_test(), first in loop_resolve_error(), presumably because there are other loop transport around when the test is run. Solution are: 1) make the test exclusive for now, 2) fix the cleanup of loop transport in other tests. Then there is failure in transport_rt_test(), because it gets loop transport from other test that is being shutdown. Sneakily add pjsip_tpmgr_get_transport_count_by_type() that's useful for debugging. * Attempt to fix unresolved winmm.lib API (e.g. timeKillEvent, timeGetTime) called from BaseClasses * Split exclusive part of transport_loop_test * CI mods: split steps, envs, shorten job names * Fix failed tsx_basic_test * Restore previously shortened job names * Replace assertion with PJ_TEST_XX in resolver_test * Add --stdout-buf and --stderr-buf options in test apps to control stdout/stderr buffering * Fix missing sipp exe on Windows CI * Fix wrong CI args on Mac. Use -j for faster make * Replace pj_rand() with own rand in unittest because rand() yields different sequence on different platform even with the same srand, making it hard to reproduce test sequence * Fix swig make error on Linux and runall.py error reading log file on Windows * Attempt to fix test repeatability by 1) delete all calls to pj_srand() except in pj_test_suite_shuffle(), 2) unit test PRNG explicitly uses pj_uint32_t instead of int. Also disable windows python tests since it is unreliable * Fixed inexistant function * Relaxing the strictness of the test since sometimes it raises error * Set regc_test() exclusive because it crashes sometimes, probably concurrency issue * Modified thread counter to unsigned long (from pj_uint32_t) since the counter value is 2*1e9 and is overflow during diff calculation * Fixed port double destruction in mips_test() and include benchmark tests in pj/config_site_test.h * Use any port since sometimes test fails with address in use error * Fix conflicted return value in udp_ioqueue_test() and let it immediately exit on error so that we can see correlated error log * Restore sleep(0) in thread test since without it the test may occasionally fail on Linux * Various attempt to fix fluke error in resolve_test.c: 1) servers use random port numbers, 2) increase delay waiting for various DNS timers, 3) reset global vars to zero because test may be repeated for IPv6 * More relaxed packet count tests in resolver_test * Use any port instead of hardcoded one in udp ioqueue unregister_test since binding fails occasionally * Rollback previous changes in resolver_test that relaxed packet count test * Protect access to pool from worker thread with mutex in resolver_test * Faster resolver_test time by reducing timeout * Use high number port to make it less prone to bind error * Remove hardcoded port number, replace with bind to any * Fix SSL to continue decrypting data after renego completes * Fixed race condition when registering SIP module in transport test * Disable loading TLS cert in TURN sock test if SChannel is used * Add (missing!) pjnath-test in Windows CI * Fix syntax error in ci-win.yml and renamed lin->ubuntu in ci-linux * Temporary workaround for MacOS rwmutex deadlock issue * More jobs/tests in CI Mac, and changed names * Fix silly typo in config_site_test.h * Prettyfy CI job names, added more test steps * Fix CI: ffmpeg lib path, faster git clone * CI: simplify job names, add some audio checking * CI: Disable SDL check because vid renderer is not available on CI machine * Fixed missing ffmpeg shared lib when running pjmedia test * CI Mac: install gnutls * Lookup renderer in video codec test * CI MacOS: attempt to fix failed OpenSSL test * CI Mac: better test split to make duration more uniform across jobs * Replace deprecated egrep with grep -E --------- Co-authored-by: Nanang Izzuddin <[email protected]> Co-authored-by: sauwming <[email protected]>
This PR contains modifications to PJSIP test apps (
pjlib-test,pjlib-util-test,pjnath-test,pjmedia-test, andpjsip-test) to use the new unit test framework (#4007) with the main objective to make them complete faster.Timing Results
Let's get straight into it. Below are the test time improvements from the original with the new framework using several worker thread settings (1-10).
GitHub CI timings:
Notes on timing
Settings with three worker threads (totalling four threads with the main thread) are significant because GitHub Ubuntu and Win runners use 4 VCPU this article. Mac-latest has 3 VCPU.
Some tests cannot be made faster than certain limit with more worker threads, because that is the longest test case duration in that test.
General look and feel
All test apps have common look and feel with uniform command line options, which look something like this:
The test outputs are also uniform, which look something like this:
Running the tests
With Makefile build system, it is easier to run the tests with the
makecommand. TheMakefileaccepts two environment variables:CI_ARGScontains arguments for the test apps, andCI_MODEto indicate we're running under GitHub CI (#3374). Sample invocation:Otherwise (e.g. on Windows) run each of the app directly. Use
-hto get help.GitHub CI modifications
CI_UBUNTU_ARGS,CI_WIN_ARGS,CI_MAC_ARGS, andCI_MODE.I think the CI should incorporate more elaborate tests and cover more features/scenarios, but that is outside the scope of this PR.
Tips on troubleshooting errors
When the logging does not convey sufficient info about the error, use
--log-no-cacheto display logs as they are written, most likely with-w 0to disable worker thread to avoid cluttering the output.But sometimes, problem only arises with specific worker thread number and test orders. In this case, troubleshooting will be challenging indeed. :) Use
-v, --verboseto display when tests are started/ended. This way you can know what tests were started when the failed test was running. After that, you can try running only these tests rather than all tests to reproduce the problem.Test shuffling (
--shufflearg) is used by default on GitHub CI via repository variables (see above). To reproduce the error, make note of the seed value used when running the (failed) test (it is printed in the output), and re-run the test (locally) using--shuffle --seed Nargs.The
--stop-erroption is useful to avoid waiting for all tests to complete when debugging an error.Open issues
Reproducibility
As mentioned above, we're supposed to be able to reproduce the test sequence by using
--shuffleand specific--seedvalue. But this is not always the case. Even with the same seed, the test sequence can be different on different machine. We already even use our own psudo random number generator inunittest.c, but sometimes this does not fix the problem.Test app modifications
General
There is a new utility file in
pjlib/src/pjlib-test/test_util.hthat is shared by all test apps to parse command line arguments, show usage, register tests, and control the unit testing process.The main front-end files (
main.c) were modified to be more nice as command line apps.The main modification in test body (
test.c) is to use the unit-test framework.Some test codes were changed, replacing manual checks with
PJ_TEST_XXX()macros, mainly to test the usage of these macros and to make the test nicer. But since it made the PR very big, I didn't continue the effort, unless when it was necessary for debugging some problems.In general, large tests needed to be split into smaller ones to make them run in parallel. But major problems arose, mainly because the tests share global states or manipulate common objects.
More specific changes are discussed below.
pjlib-testnotespjlib-testhas "special" arrangements intest.c, because it needs to test the unit-test (UT) framework first, before running the rest of the test using the UT framework. But before testing the UT framework, it needs to test the components needed by the UT framework such as list, fifobuf, and OS. And so on. That's why the test output is different than the rest of the test apps.Other than that, the modifications to the test functions are not too major, at least compared to pjnath-test and pjsip-test, and I think the test time is quite satisfactory.
pjlib-util-testnotesWe couldn't speed up more because tests such as
resolver_test()andhttp_client_test()takes about three minutes to complete and they couldn't be split up without major effort due to the use of global states. Since the test time is already quite satisfactory, I didn't pursue further optimizations.pjnath-testnotespjnath-testrequires large modifications to make the tests run in parallel as follows:mempool factory since many tests validate the memory leak in the pool factory, therefore having a single pool factory will not workserver.cso that server can be instantiated multiple times simultaneously (this was the motivation behind API to get DNS server's bound address to allow specifying zero as port number #3999).ice_test,turn_sock_test,concur_test) into individual test for each configuration, making them parallelable.As the result, there are 70 smaller test items in
pjnath-test, and with 7 worker threads, we can save 40 minutes of test time!pjmedia-testnotespjmedia-testhas the least modifications because it has very few tests. The original duration was 4m18.691s, and has come down a little to 2m8.363s with 1 worker thread.Having said that, some minor modifications were done:
pjmedia_endpt_create()withpjmedia_endpt_create2()(similarly..destroy()with..destroy2()) inmips_test()andcodec_test_vectors(), to avoid inadvertently initializingpjmedia_aud_subsyswhich on Ubuntu emits lots of debugging messages during initialization (although the messages should have been suppressed in the code).printfwith log in jbuf test to make the output tidy, and renamedjbuf_mainfunction name tojbuf_testto be consistent.pjsip-testnotespjsip-testhas also gone through the biggest and most difficult modifications to make the tests parallelable, which involves:pjsip_tpselectorto bind transaction (andtdatain case of stateless request) with specific loop transport, otherwise the transaction/tdata may find other instance of loop transporttsx_uac_testfailed because UA layer has now been registered before the test)tsx_basic_test,tsx_uac_test,tsx_uas_testto take the index to parameters rather than the parameter itself to make the test output more informative.