mt_llm

Marcel Timm, RhinoDevel, 2025

mt_llm is a C++ library for Linux and Windows that offers a pure C interface to the awesome large-language model inference engine called llama.cpp by Georgi Gerganov.

mt_llm is intended to be used for single-user LLM inference.

mt_llm supports:

Simplified/reduced configuration parameters.
Simple init./query/reset/deinit. functions.
Callback to send tokens to and more and let the callback decide, when to stop inference.
Snapshot interface to store/update/reset the current LLM state (using RAM).
Let the callback retrieve the probabilities of the digits 0 to 9 being the next inferred token while ignoring sampling (e.g. for categorization).

STT -> LLM -> TTS pipeline example in C

Take a look at the example showing a simple Speech-To-Text, Large-Language-Model, Text-To-Speech pipeline via mt_stt, mt_llm and mt_tts!

How To

Clone the mt_llm repository:

git clone https://github.com/RhinoDevel/mt_llm.git

Enter the created folder:

cd mt_llm

Get the llama.cpp submodule content:

git submodule update --init --recursive

Linux

No details for Linux here, yet, but you can take a look at the Windows instructions below and at the Makefile.

Windows

Note:

All the following examples are building static libraries, there may be use cases where dynamically linked libraries are sufficient, too.

Build llama.cpp

Compile `llama.lib`, `ggml.lib` and `common.lib` as static libraries

Compile the necessary llama.lib, ggml.lib and common.lib libraries via Visual Studio and mt_llm/llama.cpp/CMakeLists.txt as static libraries.

To do that, select x64-windows-msvc-debug or x64-windows-msvc-release as configuration, in Visual Studio.

Also modify the file mt_llm/llama.cpp/CMakePresets.json which is created by Visual Studio:

Add

, "BUILD_SHARED_LIBS": "OFF"
, "LLAMA_CURL": "OFF"

to the properties of configurePresets.cacheVariables, where the "name" is "base".

CUDA build

Modify the file llama.cpp/CMakePresets.json:

Add

, "GGML_CUDA":  "ON"

to the properties of configurePresets.cacheVariable, where the "name" is "base".

Additionally link mt_llm with this (from the llama.cpp build result folder):

ggml\src\ggml-cuda\ggml-cuda.lib

Additionally link mt_llm with these (from the CUDA folder):

lib\x64\cublas.lib
lib\x64\cuda.lib
lib\x64\cudart.lib

Test llama.cpp (without mt_llm)

The llama.cpp binaries are also created by the build described above.

E.g. for a release build, you can find them at mt_llm\llama.cpp\build-x64-windows-msvc-release\bin.

Build mt_llm

Open solution mt_llm.sln with Visual Studio (tested with 2022).
Compile in release or debug mode.

Test mt_llm

Get the DLL and LIB files resulting from the build, e.g. for release mode x64\Release\mt_llm.dll and x64\Release\mt_llm.lib, copy them to a new folder.
Copy the following header files to that new folder, too:
- mt_llm\mt_llm.h
- mt_llm\mt_llm_lib.h
- mt_llm\mt_llm_p.h
- mt_llm\mt_llm_tok_type.h
- mt_llm\mt_llm_snapshot.h
Also copy a supported GGUF model file to the same new folder.
Go to the new folder and create a file main.c with the following code:

#include "mt_llm.h"
#include "mt_llm_p.h"
#include "mt_llm_tok_type.h"

#include <string.h>
#include <stdio.h>

static bool my_callback(
    int tok,
    char const * piece,
    int type,
    float const * dig_probs)
{
    if(type == MT_TOK_TYPE_SAMPLED_NON_EOG_NON_CONTROL)
    {
        // This example may not display all characters correctly..
        printf("%s", piece);
    }
    else
    {
        if(type == MT_TOK_TYPE_SAMPLED_EOG)
        {
            printf("\n\n");
        }
    }
    return false;
}

int main(void)
{
    struct mt_llm_p p;
    
    // *****************************
    // *** Setup the parameters: ***
    // *****************************
    
    p.n_gpu_layers = 0;
    
    p.seed = -1;
    p.n_ctx = 2048;
    p.threads = 0;
    
    p.top_k = 40;
    p.top_p = 0.95;
    p.min_p = 0.05;
    p.temp = 0.8;
    p.grammar[0] = '\0';
    
    strncpy(
        p.model_file_path,
        "gemma-3-1b-it-Q5_K_M.gguf",
        MT_LLM_P_LEN_MODEL_FILE_PATH);

    strncpy(
        p.sys_prompt,
        "You are a helpful AI assistant.",
        MT_LLM_P_LEN_SYS_PROMPT);
    p.prompt_beg_delim[0] = '\0';
    p.prompt_end_delim[0] = '\0';
    p.sys_prompt_beg_delim[0] = '\0';
    p.sys_prompt_mid_delim[0] = '\0';
    p.sys_prompt_end_delim[0] = '\0';
    p.rev_prompt[0] = '\0';
    p.think_beg_delim[0] = '\0';
    p.think_end_delim[0] = '\0';

    p.try_prompts_by_model = true;

    p.callback = my_callback;

    // **************************
    // *** Initialize mt_llm: ***
    // **************************

    mt_llm_reinit(&p); // Ignoring return value, here..
    
    // **********************
    // *** Query the LLM: ***
    // **********************
    
    mt_llm_query("Please tell me a very short story about a dog!");
    
    // (inference is running here, and will call the callback for each token)
    
    // ****************************
    // *** Deinitialize mt_llm: ***
    // ****************************

    mt_llm_deinit();

    return 0;
}

Open x64 Native Tools Command Prompt for VS 2022 commandline.
Compile via cl main.c mt_llm.lib.
Run main.exe.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
mt_llm		mt_llm
stt_llm_tts-pipeline-example		stt_llm_tts-pipeline-example
.gitignore		.gitignore
.gitmodules		.gitmodules
CREDITS.txt		CREDITS.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
mt_llm.sln		mt_llm.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mt_llm

STT -> LLM -> TTS pipeline example in C

How To

Linux

Windows

Note:

Build llama.cpp

Compile `llama.lib`, `ggml.lib` and `common.lib` as static libraries

CUDA build

Test llama.cpp (without mt_llm)

Build mt_llm

Test mt_llm

About

Uh oh!

Releases

Languages

License

RhinoDevel/mt_llm

Folders and files

Latest commit

History

Repository files navigation

mt_llm

STT -> LLM -> TTS pipeline example in C

How To

Linux

Windows

Note:

Build llama.cpp

Compile llama.lib, ggml.lib and common.lib as static libraries

CUDA build

Test llama.cpp (without mt_llm)

Build mt_llm

Test mt_llm

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages

Compile `llama.lib`, `ggml.lib` and `common.lib` as static libraries