LLVMSQLite was built for my MSc. Thesis at Imperial College London.
If you want to read more about it and the design choices I made, the thesis PDF is available in the repository.
LLVMSQLite does not use the canonical build process used by SQLite, but rather a mix between their configure script and Makefiles and CMake, for everything LLVM-related.
Since the Makefile does some serious things that I'm not sure I'd be able to do in a CMake project, the CMake project calls the SQLite configure script and Makefiles to generate the resources it needs.
More precisely: traditionally, SQLite uses what they call the amalgamation which is essentially a big C file containing everything SQLite needs to be compiled (it's a pretty big C file), called sqlite3.c.
Generating sqlite3.c takes a bit of time, that is why it's not desirable to do so when working on the SQLite code.
Though, the amalgamation can be very useful for loading the definition of certain functions at run-time, for performing inlining in the JIT compiler.
That is why the CMake project uses both build methods:
sqlite3.cfor generatingsqlite3.ll, a huge IR module loaded at JIT-compile-time, but never used in the AOT build process ;- The C files in
srcthat are compiled together to formlibsqlite3, against which LLVMSQLite is linked.
That allows me to get the best of both worlds.
Now let's get to the serious stuff, how to build LLVMSQLite. Assuming you have cloned the LLVMSQLite repository in the current working directory:
cd LLVMSQLite
mkdir build
cd build
cmake .. -G Ninja -DCLANG11=path/to/clang11 -DCMAKE_BUILD_TYPE=RelWithDebInfo
ninja
The CLANG11 option allows you to give the path to clang, which is not necessarily the clang used to compile the project. The LLVM version of the clang at CLANG11 should be the same as the version you're linking against in the project. Otherwise, loading the IR module (sqlite3.ll)
may cause crashes due to LLVM not being able to verify the loaded module.
Other options exist:
DXRAY=ONcompiles LLVMSQLite with LLVM-XRay (for profiling purposes) ;DVTUNE=ONcompiles LLVMSQLite with calls to the Intel VTune API (libittnotify), to only profile the parts of the process that are of interest to us ;DSANITIZE=[san](wheresancan beaddress,undefinedormemory) compiles LLVMSQLite with one of the *San tools for checking for issues.
Note: Should you have installed the custom LLVM libraries to another directory than ~/llvm, you need to update the definitions in src/mlir/CMakeLists.txt. They should be fairly easy to adapt once you've built LLVM (see Prerequisites). The affected line is the following:
# The directory where LLVM libraries are installed
set(PREFIX "$ENV{HOME}/llvm/usr/local/")
Clone my version of LLVM from the repo.
The difference between my LLVM and upstream llvm/llvm-project is that:
- Some
assertshave been made clearer to help when debugging IR generation (only ifNDEBUGis not #define'd); - Some MLIR operations have been added to the LLVM dialect:
- Memory intrinsics (
memcmp,memcpy,memset,memcpyinline) ; stackSaveandstackRestorecallcan now call values (and not onlyLLVMFuncOp's)switchinstructions can now be emitted
- Memory intrinsics (
As explained earlier, you'll need to compile and install MLIR and Clang:
mkdir ~/llvm/
mkdir build && cd build
export CC=clang
export CXX=clang++
cmake -G Ninja ../llvm \
-DLLVM_ENABLE_PROJECTS="mlir;clang" \
-DLLVM_TARGETS_TO_BUILD="X86" \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_ENABLE_ASSERTIONS=ON
DESTDIR=~/llvm ninja install
LLVMSQLite works in the exact same way as the SQLite shell. The build process produces three binaries:
shell_defaultis the exact same thing as the classical SQLite shell;shell_jitworks the same way asshell_defaultbut uses the JIT compiler instead of the embedded interpreter;profilable_shellworks the same way as the others but allows you to switch JIT compilation on or off using-jitor-nojit.
I have modified the shell so that if you pass an initialisation SQL file with -init, SQLite will exit after executing it. That is useful when executing benchmarking scripts.
To carry out performance tests, I use the TPC-H benchmark. To do that, you'll need (1) a TPC-H database and (2) TPC-H compliant queries.
Getting TPC-H and SSBM databases
I use a modified version of the very useful TPC-H SQLite repository. It can generate database files of various scales for both TPC-H and SSBM benchmarks.
git clone https://github.com/kowalskithomas/llvmsqlite_dbgen dbgen
cd dbgen
make -f MakefileTPCH
This will generate several TPC-H-[Size].db files that you can use with the SQLite shell:
./shell_default /home/me/tpch/TPC-H-big.db
Getting SQL queries
TPC-H is a set of 22 SQL queries that are randomly generated given some constraints. I have implemented a generator that you will find in llvmsqlite_util/benchmarking.
To use it, simply do the following:
cd llvmsqlite_util/benchmarking
python3 generator_tpch.py --query N > q.sql
Where N is any number between 1 and 22. (The script is compatible with Python >= 3.7.)
To use the SQL statement directly in the shell, do the following:
./shell /path/to/the.db -init /path/to/q.sql
The custom LLVMSQLite shell will exit immediately once the statement is processed.