Skip to content

[PROPOSAL] Add support for dynamic code analysis (Sanitizers)#18303

Merged
luotao1 merged 3 commits intoPaddlePaddle:developfrom
kbinias:kbinias/add-sanitizers
Aug 1, 2019
Merged

[PROPOSAL] Add support for dynamic code analysis (Sanitizers)#18303
luotao1 merged 3 commits intoPaddlePaddle:developfrom
kbinias:kbinias/add-sanitizers

Conversation

@kbinias
Copy link
Contributor

@kbinias kbinias commented Jun 24, 2019

This PR adds support for Sanitizers:

  • AddressSanitizer (for a fast memory error detector)
  • LeakSanitizer (for a run-time memory leak detector)
  • MemorySanitizer (for a detector of uninitialized reads)
  • ThreadSanitizer (for detect data races)

These Sanitizers are based on compiler instrumentation, therefore a rebuild is required in order to use these tools. Sanitizers are implemented in Clang starting 3.1 and GCC starting 4.8 and supported on Linux x86_64 machines. AddressSanitizer and UndefinedBehaviorSanitizer are also available on macOS.

Usage

You can enable the Sanitizers with SANITIZE_ADDRESS, SANITIZE_LEAK, SANITIZE_MEMORY, SANITIZE_THREAD options in your CMake configuration. You can do this by passing e.g. -DSANITIZE_ADDRESS=ON on your command line. The Sanitizers check your program, while it's running.
Only one Sanitizer is allowed in compile time.

Usefully option is halt_on_error e.g. for ThreadSanitizer export TSAN_OPTIONS="halt_on_error=1" to exit after first reported error.

IMO: maybe it would be nice to connect PaddlePaddle CI with these Sanitizers in future.

Documentation

Comprehensive documentation is here https://github.com/google/sanitizers

@kbinias kbinias added the Intel label Jun 24, 2019
@kbinias kbinias requested a review from luotao1 June 24, 2019 13:26
@luotao1
Copy link
Contributor

luotao1 commented Jul 3, 2019

What's the difference between Sanitizers and gperftools, we use pprof to detect memory leak. For example, #18372 (comment).

@kbinias
Copy link
Contributor Author

kbinias commented Jul 4, 2019

LeakSanitizer was designed to replace the gperftools Heap Leak Checker and to bring leak detection to Sanitizers users. It is easy to deploy it in PaddlePaddle.
gperftools is a collection of a high-performance multi-threaded malloc() implementation, plus some pretty nifty performance analysis tools. pprof is a tool for visualization and analysis of profiling data.
We will still use both of these tools. Sanitizers are becoming more and more popular tools and give us more possibilities to detect problems in our source code.
With Sanitizers we detected problems with PaddlePaddle memory leak and ngraph : fix for multithreading test_analyzer_image_classification.

"Sanitizers" is a family of dynamic testing tools built into C++ compilers (Clang and GCC):

  • AddressSanitizer (ASan): finds memory errors, such as use-after-free, buffer overflows, and leaks.
  • MemorySanitizer (MSan): finds uses of uninitialized memory.
  • ThreadSanitizer (TSan): finds data races, deadlocks, and other threading bugs.
  • We also have UndefinedBehaviorSanitizer: finds other kinds of undefined behavior, such as use of incorrect dynamic type, shift by illegal amount and many others. I skip it in this PR.

Sanitizers are fast, e.g. Valgrind introduced 20x and more slowdown; *San is ~2x slowdown, 1.5x-3x memory overhead.

In paddle/build/third_party there are some projects which use Sanitizers, e.g. gtest, protobuf, mkldnn.

@luotao1
Copy link
Contributor

luotao1 commented Jul 4, 2019

Thanks very much for your detail explanation!
Could you give an example for how to use Sanitizers? Or is -DSANITIZE_ADDRESS=ON on your command line enough?

@kbinias
Copy link
Contributor Author

kbinias commented Jul 5, 2019

Yes, it is enough to add only one flag to CMake: -DSANITIZE_ADDRESS=ON or -DSANITIZE_LEAK=ON or -DSANITIZE_MEMORY=ON or -DSANITIZE_THREAD=ON.

Important: only one Sanitizer is allowed in compile time.

  • Example for AddressSanitizer:
    cmake .. -DWITH_TESTING=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_NGRAPH=OFF -DWITH_GPU=OFF -DSANITIZE_ADDRESS=ON

  • Example for ThreadSanitizer:
    cmake .. -DWITH_TESTING=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_NGRAPH=ON -DWITH_GPU=OFF -DSANITIZE_THREAD=ON

  • Example for LeakSanitizer:
    cmake .. -DWITH_TESTING=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -DWITH_PYTHON=ON -DWITH_NGRAPH=OFF -DWITH_GPU=OFF -DSANITIZE_LEAK=ON

Hint: export an additional environment variable to limit the number of error messages before you run PaddlePaddle, e.g. for AddressSanitizer export ASAN_OPTIONS="halt_on_error=1", for ThreadSanitizer export TSAN_OPTIONS="halt_on_error=1"

@kbinias
Copy link
Contributor Author

kbinias commented Jul 5, 2019

@luotao1 Please don't merge this PR yet, we're still testing Sanitizers in PaddlePaddle.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. But do we install Sanitizers in latest-dev images by update Dockerfile or do we need a Sanitizers.cmake to install it?

CMakeLists.txt Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the option, and write like CMAKE_BUILD_TYPE

Paddle/CMakeLists.txt

Lines 83 to 87 in 047bba8

# CMAKE_BUILD_TYPE
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE STRING
"Choose the type of build, options are: Debug Release RelWithDebInfo MinSizeRel"
FORCE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need Sanitizers.cmake file to include Sanitizers support in PaddlePaddle, it is built in gcc compiler.
I moved many options to one option SANITIZER_TYPE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is built in gcc compiler.

Does gcc48 support it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a part of GCC starting from version 4.8

@kbinias kbinias force-pushed the kbinias/add-sanitizers branch from a7b79b6 to 9064e57 Compare July 8, 2019 15:16
@kbinias
Copy link
Contributor Author

kbinias commented Jul 8, 2019

Now, to use sanitizer we have only one CMake option SANITIZER_TYPE with different values (Address, Leak, Memory, Thread, Undefined), e.g.:
cmake .. -DWITH_TESTING=ON -DWITH_PYTHON=ON -DWITH_GPU=OFF -DSANITIZER_TYPE="Thread"

Notice: Undefined is undefined behavior detector (UndefinedBehaviorSanitizer).

@luotao1
Copy link
Contributor

luotao1 commented Jul 31, 2019

@zhupengyang Could you verify the build command in #18303 (comment)?

cmake .. -DWITH_TESTING=ON -DWITH_PYTHON=ON -DWITH_GPU=OFF -DSANITIZER_TYPE="Thread"

@zhupengyang
Copy link
Contributor

@zhupengyang Could you verify the build command in #18303 (comment)?

cmake .. -DWITH_TESTING=ON -DWITH_PYTHON=ON -DWITH_GPU=OFF -DSANITIZER_TYPE="Thread"

commit id: 3ac4715
build shell:
b039f1277de83bf745cf038bd
build result:
1

Verification conclusion: the build command is workable.
@luotao1

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants