This project demonstrates how to fine-tune the TinyLlama-1.1B-Chat-v1.0 model using the LoRA (Low-Rank Adaptation) technique for efficient parameter-efficient training. The training is optimized for CPU environments with limited RAM (e.g., 16GB).
- src/TrainTinyLlama.py: Main script for fine-tuning TinyLlama with LoRA.
- dataset/dataset.json: Training data in JSON format.
The dataset should be a JSON file containing a list of objects, each with input and output fields. Example:
{
"input": "Generate a form with a panel with color white",
"output": {
"Type": "TForm",
"Name": "FrmMainForm",
"Caption": "Sample Form",
"Width": 800,
"Height": 600,
"Children": [
{
"Type": "TPanel",
"Name": "Panel1",
"Left": 10,
"Top": 10,
"Width": 200,
"Height": 100,
"Color": "#FFFFFF"
}
]
}
}
- Model: TinyLlama-1.1B-Chat-v1.0
- Adapter: LoRA (Low-Rank Adaptation)
- Target Modules:
q_proj,v_proj - LoRA Config:
r=8,alpha=16,dropout=0.05 - Batch Size: 1 (adjustable)
- Epochs: 1 (increase for better results)
- Device: CPU only (
use_cpu=True)
To start fine-tuning, run:
python src/TrainTinyLlama.pyThe script will:
- Load and preprocess the dataset.
- Apply LoRA adapters to the model.
- Train using Hugging Face's
TrainerAPI. - Save the fine-tuned model and tokenizer to the
TinyLlama-lora-outdirectory.
- Fine-tuned Model: Saved in
TinyLlama-lora-out/ - Logs: Saved in
logs/
- Python 3.8+
- transformers
- datasets
- peft
- torch
Install dependencies:
pip install -r src/requirements.txtpython src\Merge_lora.pypython convert_hf_to_gguf.py ../TinyLlama-merged --outfile ./tinyllama-custom.gguf
Windows > %USERPROFILE%.ollama\models
Create a Modelfile FROM ./tinyllama-custom.gguf
ollama create tinyllama-custom -f Modelfile- The
.ggufformat is compatible with llama.cpp, a C++ project for efficient execution of Llama models on CPU and GPU. - To use your custom model with
llama.cpp, simply copy the.gguffile to the models folder and follow the instructions in the repository. - Documentation: llama.cpp README
- Model conversion: convert.py
Example usage:
./main -m ./tinyllama-custom.gguf -p "Your prompt here"- The script is optimized for CPU training. For GPU, set
no_cuda=Falseand adjustfp16as needed. - LoRA enables efficient fine-tuning with minimal memory usage.
- Adjust hyperparameters (epochs, batch size) based on your hardware and dataset size.
- TinyLlama Model Card
- LoRA: Low-Rank Adaptation of Large Language Models
- Hugging Face Transformers Documentation
- llama.cpp (GitHub)
{'train_runtime': 105974.4997, 'train_samples_per_second': 0.016, 'train_steps_per_second': 0.008, 'train_loss': 0.3685814903443118, 'epoch': 5.0}
100%|████████████████████████████████████████████████████████████████████████████████| 830/830 [29:26:14<00:00, 127.68s/it]