kLLaMa-jvm is a Kotlin/JVM wrapper for llama.cpp that provides a seamless bridge between the Java Virtual Machine and the native C++ inference engine through a robust JNI (Java Native Interface) adapter.
This project serves as a JNI adapter that enables JVM-based applications (Kotlin/Java) to leverage the power of llama.cpp for running LLaMA models efficiently. The JNI layer abstracts the complexity of native memory management and cross-platform compatibility, providing a clean, idiomatic Kotlin API.
┌─────────────────┐ JNI Bridge ┌─────────────────────┐
│ Kotlin/Java │ ──────────────→ │ Native C++ (LLaMA) │
│ │ │ │
│ ┌───────────────┤ │ ┌───────────────────┤
│ │ Inference API │ │ ←──────────────┤ │ llama.cpp │
│ └───────────────┤ │ └───────────────────┤
│ │ │ │
└─────────────────┘ └─────────────────────┘
- JNI Adapter Layer: Located in
llama-library/src/main/cpp/inference-adapter.cpp - Native Backend: llama.cpp inference engine compiled to a shared library
- Kotlin Wrapper: High-level API that handles coroutines and async operations
- JDK 25
- Kotlin 2.3.20
- CMake 3.31
- C++ compiler (Clang++/GCC/MSVC)
- Git (for initializing submodules)
git submodule update --init --recursiveThe JNI adapter compiles llama.cpp into a platform-specific shared library:
gradle :llama-library:compileNativeNote: Currently configured for Mac M-series processors. Modify
llama-library/CMakeLists.txtfor other platforms.
The build process creates a native shared library (libllama.dylib on macOS, libllama.so on Linux) that the JVM loads via System.loadLibrary("llama").
import kotlinx.coroutines.runBlocking
import pro.tabakov.kllama.InferenceFactory
fun main() {
// Load the JNI native library
System.loadLibrary("llama")
runBlocking {
val kLLaMa = InferenceFactory.loadModel(
"/path/to/model.gguf", // Path to model
0.7f, // Temperature
0L // Context Size
)
val requests = listOf("HI!", "How are you?", "What is your name?")
requests.forEach { request ->
println("You: $request")
print("AI: ")
kLLaMa.ask(request).collect { message ->
print(message)
}
println()
println("----")
}
println("Context Size Used: ${kLLaMa.getContextSizeUsed()}")
}
}Update the model path in examples/kotlin-jvm-app/src/main/kotlin/pro/atabakov/App.kt and run:
gradle examples:kotlin-jvm-app:runkLLaMa-jvm/
├── llama-library/ # JNI adapter & native build
│ ├── CMakeLists.txt # CMake configuration for JNI
│ ├── build.gradle.kts # Gradle plugin for CMake integration
│ └── src/main/cpp/ # JNI adapter implementation
│ └── inference-adapter.cpp # Core JNI functions
├── kLLaMa/ # Kotlin wrapper API
│ └── src/main/kotlin/ # High-level Kotlin interfaces
├── examples/ # Usage examples
│ └── kotlin-jvm-app/ # Sample application
└── llama.cpp/ # Git submodule - native LLaMA engine
The JNI adapter provides:
- Memory Management: Automatic cleanup of native resources
- Exception Handling: Proper mapping between native errors and Kotlin exceptions
- Async Operations: Coroutines support for non-blocking inference
- Type Conversion: Seamless conversion between Kotlin and C++ types
- Resource Management: RAII-style resource handling in native code
loadModel: Creates a native model instance and returns a Kotlin wrapperask: Performs streaming inference with Kotlin Flow integrationgetContextSizeUsed: Retrieves memory usage statistics
| Platform | Status | Notes |
|---|---|---|
| macOS (M-series) | ✅ Working | Default configuration |
| Linux x86_64 | Requires CMake adjustments | |
| Windows | Requires MSVC toolchain |
To support other platforms, modify:
llama-library/CMakeLists.txt- Compiler flags and dependencies- JNI library loading path in your application
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request