Skip to content

Simple example of using llama.cpp with kotlin (JVM)

License

Notifications You must be signed in to change notification settings

Komdosh/kLLaMa-jvm

Repository files navigation

kLLaMa-jvm

Kotlin Version License llama.cpp

kLLaMa-jvm is a Kotlin/JVM wrapper for llama.cpp that provides a seamless bridge between the Java Virtual Machine and the native C++ inference engine through a robust JNI (Java Native Interface) adapter.

🎯 Project Purpose

This project serves as a JNI adapter that enables JVM-based applications (Kotlin/Java) to leverage the power of llama.cpp for running LLaMA models efficiently. The JNI layer abstracts the complexity of native memory management and cross-platform compatibility, providing a clean, idiomatic Kotlin API.

🔧 Architecture

┌─────────────────┐    JNI Bridge    ┌─────────────────────┐
│   Kotlin/Java   │ ──────────────→  │  Native C++ (LLaMA) │
│                 │                  │                     │
│ ┌───────────────┤                  │ ┌───────────────────┤
│ │ Inference API │ │ ←──────────────┤ │ llama.cpp         │
│ └───────────────┤                  │ └───────────────────┤
│                 │                  │                     │
└─────────────────┘                  └─────────────────────┘
  • JNI Adapter Layer: Located in llama-library/src/main/cpp/inference-adapter.cpp
  • Native Backend: llama.cpp inference engine compiled to a shared library
  • Kotlin Wrapper: High-level API that handles coroutines and async operations

📋 Prerequisites

  • JDK 25
  • Kotlin 2.3.20
  • CMake 3.31
  • C++ compiler (Clang++/GCC/MSVC)
  • Git (for initializing submodules)

🛠️ Installation

1. Initialize Submodules

git submodule update --init --recursive

2. Build the JNI Native Library

The JNI adapter compiles llama.cpp into a platform-specific shared library:

gradle :llama-library:compileNative

Note: Currently configured for Mac M-series processors. Modify llama-library/CMakeLists.txt for other platforms.

3. Verify JNI Integration

The build process creates a native shared library (libllama.dylib on macOS, libllama.so on Linux) that the JVM loads via System.loadLibrary("llama").

🚀 Usage Example

import kotlinx.coroutines.runBlocking
import pro.tabakov.kllama.InferenceFactory

fun main() {
    // Load the JNI native library
    System.loadLibrary("llama")

    runBlocking {
        val kLLaMa = InferenceFactory.loadModel(
            "/path/to/model.gguf", // Path to model
            0.7f, // Temperature
            0L // Context Size
        )

        val requests = listOf("HI!", "How are you?", "What is your name?")
        requests.forEach { request ->
            println("You: $request")
            print("AI: ")
            kLLaMa.ask(request).collect { message ->
                print(message)
            }
            println()
            println("----")
        }
        println("Context Size Used: ${kLLaMa.getContextSizeUsed()}")
    }
}

Running the Example

Update the model path in examples/kotlin-jvm-app/src/main/kotlin/pro/atabakov/App.kt and run:

gradle examples:kotlin-jvm-app:run

🏗️ Project Structure

kLLaMa-jvm/
├── llama-library/          # JNI adapter & native build
│   ├── CMakeLists.txt      # CMake configuration for JNI
│   ├── build.gradle.kts    # Gradle plugin for CMake integration
│   └── src/main/cpp/       # JNI adapter implementation
│       └── inference-adapter.cpp  # Core JNI functions
├── kLLaMa/                 # Kotlin wrapper API
│   └── src/main/kotlin/    # High-level Kotlin interfaces
├── examples/               # Usage examples
│   └── kotlin-jvm-app/     # Sample application
└── llama.cpp/              # Git submodule - native LLaMA engine

📚 JNI Implementation Details

The JNI adapter provides:

  • Memory Management: Automatic cleanup of native resources
  • Exception Handling: Proper mapping between native errors and Kotlin exceptions
  • Async Operations: Coroutines support for non-blocking inference
  • Type Conversion: Seamless conversion between Kotlin and C++ types
  • Resource Management: RAII-style resource handling in native code

Key JNI Functions:

  • loadModel: Creates a native model instance and returns a Kotlin wrapper
  • ask: Performs streaming inference with Kotlin Flow integration
  • getContextSizeUsed: Retrieves memory usage statistics

🌐 Platform Support

Platform Status Notes
macOS (M-series) ✅ Working Default configuration
Linux x86_64 ⚠️ Configurable Requires CMake adjustments
Windows ⚠️ Configurable Requires MSVC toolchain

To support other platforms, modify:

  • llama-library/CMakeLists.txt - Compiler flags and dependencies
  • JNI library loading path in your application

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

About

Simple example of using llama.cpp with kotlin (JVM)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published