Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
92db582
Wan 2.2 A14B
Oct 17, 2025
8e26f86
wip: wan 2.2 a14b + examples + CLIP img embed support
Oct 17, 2025
50f8870
group offload support + fixes for tests
Oct 18, 2025
613ec14
fix test errors
Oct 18, 2025
57e5031
add FSDP2 and group offload to the training wizard
Oct 18, 2025
c354e09
rename i2v-480p-14b-2.2-low|high -> i2v-14b-2.2-low|high
Oct 18, 2025
8a84a9e
fix test for data backend
Oct 18, 2025
24733c6
port concept of musubi-tuner wan_force_2_1_time_embedding
Oct 18, 2025
33355b9
fix for test
Oct 18, 2025
ddb87bf
fix qwen example config lr scheduler warmup
Oct 18, 2025
c6cd875
error handling improvement
Oct 19, 2025
e58f489
Merge branch 'bugfix/qwen-unpack' of https://github.com/bghira/Simple…
Oct 19, 2025
2089654
refactor qwen fix
Oct 19, 2025
67beeae
bring more inline with upstream
Oct 19, 2025
0a016ab
revert qwen code to pre-backports
Oct 19, 2025
c772cdf
fix TREAD impl
Oct 19, 2025
82b37b2
registry missing
Oct 19, 2025
b38a92f
fix search path to look in notebooks and workspace first
Oct 19, 2025
2e20210
search for /workspace and /notebooks first before suggesting xdg-home
Oct 19, 2025
031358c
multi-gpu fix attempt for accelerate kwargs
Oct 19, 2025
3b4ff55
fix some lingering bugs and test failures
Oct 19, 2025
d38c93a
better auto-stripping of non-trainer args
Oct 19, 2025
d0a329e
prefer nvidia-ml-py instead of abandoned pynvml
Oct 19, 2025
0f68a35
add dependency
Oct 19, 2025
5fb118a
attempt to resolve num_processes houdini act
Oct 19, 2025
2caa7e4
fix model path
Oct 19, 2025
359cc61
use tmp file to load video using tsr
Oct 19, 2025
5d1614a
update tsr dependency version
Oct 19, 2025
1aa508c
fix redeclaration
Oct 19, 2025
c38892c
relax video model detection
Oct 19, 2025
8dcc97d
refactor cond image embed abstraction
Oct 19, 2025
6e216e0
update wan example model path
Oct 19, 2025
9fdc26e
fix tread config handler
Oct 19, 2025
0813cba
add pipelinetype for img2video
Oct 20, 2025
fe89254
add missing i2v logic
Oct 20, 2025
dbfd3dd
update tsr
Oct 20, 2025
32efdb1
allow alt image embed provider to return whatever it needs to
Oct 20, 2025
5ccc226
make more robust data loading for videos in huggingface data backend
Oct 20, 2025
0b3586f
log specific path when loading
Oct 20, 2025
3434ace
better contract for image embedder producers and more direct retrieva…
Oct 20, 2025
c1bc36e
do not create /tmp entries where we cannot
Oct 20, 2025
625ceec
wip: flf2v 2.1, ti2v 2.2, i2v 2.1
Oct 20, 2025
eaba160
update paths
Oct 20, 2025
88fbbd5
relocate inputs to CPU if encoder is there (group offload)
Oct 20, 2025
199010c
Merge branch 'main' into feature/wan-2.2
bghira Oct 20, 2025
de7f9b5
add missing chroma value to legacy list of model families
Oct 20, 2025
d92d81d
do not try and apply group offload to text encoder(s)
Oct 20, 2025
e477935
do as transformers requests and use PreTrainedTokenizer
Oct 20, 2025
b9b47bd
fix for rocm error
Oct 20, 2025
bfa7a35
chroma: more fixes for tokeniser seq len and autotokenizer usage
Oct 20, 2025
2df5855
chroma: more fixes for attn masking with image tokens
Oct 20, 2025
e5b6dc5
chroma: fixes for text encoder inputs to pipeline
Oct 20, 2025
7c1b4cc
(#1780) check for user prompt library validity
Oct 20, 2025
c342f70
tighten check for which models need img conditioning embed
Oct 20, 2025
138151c
vae hook for transforming the vae or its samples before the encode, h…
Oct 20, 2025
383b306
round progress when displaying percentage
Oct 20, 2025
326faf3
update docs for wan 2.x broad compatibility
Oct 20, 2025
5e8e5a7
fix tests and adjust docs
Oct 20, 2025
a94a8bf
fix more qwen tests
Oct 20, 2025
90973ed
fix more qwen stuff
Oct 21, 2025
bd33a27
store test progress in mock
Oct 21, 2025
63e8d3c
fix more qwen stuff
Oct 21, 2025
ffa8481
rocm test failure fixes
Oct 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ SimpleTuner provides comprehensive training support across multiple diffusion mo
- **Gradient checkpointing** - Configurable intervals for memory/speed optimization
- **Loss functions** - L2, Huber, Smooth L1 with scheduling support
- **SNR weighting** - Min-SNR gamma weighting for improved training dynamics
- **Group offloading** - Diffusers v0.33+ module-group CPU/disk staging with optional CUDA streams

### Model-Specific Features

Expand All @@ -99,6 +100,7 @@ SimpleTuner provides comprehensive training support across multiple diffusion mo
- **T5 masked training** - Enhanced fine details for Flux and compatible models
- **QKV fusion** - Memory and speed optimizations (Flux, Lumina2)
- **TREAD integration** - Selective token routing for Wan and Flux models
- **Wan 2.x I2V** - High/low stage presets plus a 2.1 time-embedding fallback (see Wan quickstart)
- **Classifier-free guidance** - Optional CFG reintroduction for distilled models

### Quickstart Guides
Expand Down
29 changes: 26 additions & 3 deletions documentation/DATALOADER.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ Here is the most basic example of a dataloader configuration file, as `multidata

### `dataset_type`

- **Values:** `image` | `video` | `text_embeds` | `image_embeds` | `conditioning`
- **Description:** `image` and `video` datasets contain your training data. `text_embeds` contain the outputs of the text encoder cache, and `image_embeds` contain the VAE outputs, if the model uses one. When a dataset is marked as `conditioning`, it is possible to pair it to your `image` dataset via [the conditioning_data option](#conditioning_data)
- **Values:** `image` | `video` | `text_embeds` | `image_embeds` | `conditioning_image_embeds` | `conditioning`
- **Description:** `image` and `video` datasets contain your training data. `text_embeds` contain the outputs of the text encoder cache, `image_embeds` contain the VAE latents (when a model uses one), and `conditioning_image_embeds` store cached conditioning image embeddings (such as CLIP vision features). When a dataset is marked as `conditioning`, it is possible to pair it to your `image` dataset via [the conditioning_data option](#conditioning_data)
- **Note:** Text and image embed datasets are defined differently than image datasets are. A text embed dataset stores ONLY the text embed objects. An image dataset stores the training data.
- **Note:** Don't combine images and video in a **single** dataset. Split them out.

Expand All @@ -69,6 +69,22 @@ Here is the most basic example of a dataloader configuration file, as `multidata
- **Only applies to `dataset_type=image`**
- If unset, the VAE outputs will be stored on the image backend. Otherwise, you may set this to the `id` of an `image_embeds` dataset, and the VAE outputs will be stored there instead. Allows associating the image_embed dataset to the image data.

### `conditioning_image_embeds`

- **Applies to `dataset_type=image` and `dataset_type=video`**
- When a model reports `requires_conditioning_image_embeds`, set this to the `id` of a `conditioning_image_embeds` dataset to store cached conditioning image embeddings (for example, CLIP vision features for Wan 2.2 I2V). If unset, SimpleTuner writes the cache to `cache/conditioning_image_embeds/<dataset_id>` by default, guaranteeing it no longer collides with the VAE cache.
- Models that need these embeds must expose an image encoder through their primary pipeline. If the model cannot supply the encoder, preprocessing will fail early instead of silently generating empty files.

#### `cache_dir_conditioning_image_embeds`

- **Optional override for the conditioning image embed cache destination.**
- Set this when you want to pin the cache to a specific filesystem location or have a dedicated remote backend (`dataset_type=conditioning_image_embeds`). When omitted, the cache path described above is used automatically.

#### `conditioning_image_embed_batch_size`

- **Optional override for the batch size used while generating conditioning image embeds.**
- Defaults to the `conditioning_image_embed_batch_size` trainer argument or the VAE batch size when not explicitly provided.

### `type`

- **Values:** `aws` | `local` | `csv` | `huggingface`
Expand Down Expand Up @@ -430,7 +446,8 @@ In order, the lines behave as follows:
"probability": 1.0,
"repeats": 0,
"text_embeds": "alt-embed-cache",
"image_embeds": "vae-embeds-example"
"image_embeds": "vae-embeds-example",
"conditioning_image_embeds": "conditioning-embeds-example"
},
{
"id": "another-special-name-for-another-backend",
Expand All @@ -451,6 +468,12 @@ In order, the lines behave as follows:
"dataset_type": "image_embeds",
"disabled": false,
},
{
"id": "conditioning-embeds-example",
"type": "local",
"dataset_type": "conditioning_image_embeds",
"disabled": false
},
{
"id": "an example backend for text embeds.",
"dataset_type": "text_embeds",
Expand Down
34 changes: 34 additions & 0 deletions documentation/OPTIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,40 @@ Where `foo` is your config environment - or just use `config/config.json` if you

- **What**: Offloads text encoder weights to CPU when VAE caching is going.
- **Why**: This is useful for large models like HiDream and Wan 2.1, which can OOM when loading the VAE cache. This option does not impact quality of training, but for very large text encoders or slow CPUs, it can extend startup time substantially with many datasets. This is disabled by default due to this reason.
- **Tip**: Complements the group offloading feature below for especially memory-constrained systems.

### `--enable_group_offload`

- **What**: Enables diffusers' grouped module offloading so model blocks can be staged on CPU (or disk) between forward passes.
- **Why**: Dramatically reduces peak VRAM usage on large transformers (Flux, Wan, Auraflow, LTXVideo, Cosmos2Image) with minimal performance impact when used with CUDA streams.
- **Notes**:
- Mutually exclusive with `--enable_model_cpu_offload` — pick one strategy per run.
- Requires diffusers **v0.33.0** or newer.

### `--group_offload_type`

- **Choices**: `block_level` (default), `leaf_level`
- **What**: Controls how layers are grouped. `block_level` balances VRAM savings with throughput, while `leaf_level` maximises savings at the cost of more CPU transfers.

### `--group_offload_blocks_per_group`

- **What**: When using `block_level`, the number of transformer blocks to bundle into a single offload group.
- **Default**: `1`
- **Why**: Increasing this number reduces transfer frequency (faster) but keeps more parameters resident on the accelerator (uses more VRAM).

### `--group_offload_use_stream`

- **What**: Uses a dedicated CUDA stream to overlap host/device transfers with compute.
- **Default**: `False`
- **Notes**:
- Automatically falls back to CPU-style transfers on non-CUDA backends (Apple MPS, ROCm, CPU).
- Recommended when training on NVIDIA GPUs with spare copy engine capacity.

### `--group_offload_to_disk_path`

- **What**: Directory path used to spill grouped parameters to disk instead of RAM.
- **Why**: Useful for extremely tight CPU RAM budgets (e.g., workstation with large NVMe drive).
- **Tip**: Use a fast local SSD; network filesystems will significantly slow training.

### `--pretrained_model_name_or_path`

Expand Down
4 changes: 3 additions & 1 deletion documentation/QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,11 @@ For the complete and most accurate feature matrix, please see the [main README.m
| [Lumina2](/documentation/quickstart/LUMINA2.md) | 2B | ✓ | ✓ | ✓ | optional (int8) | bf16 | ✓ | ✓ | |
| [Cosmos2](/documentation/quickstart/COSMOS2IMAGE.md) | 2B | ✓ | ✓ | ✓ | not recommended | bf16 | ✓ | ✓ | |
| [LTX Video](/documentation/quickstart/LTXVIDEO.md)| ~2.5 B | ✓ | ✓ | ✓ | optional (int8,  fp8) | bf16 | ✓ | ✓ | |
| [Wan 2.1](/documentation/quickstart/WAN.md) | 1.3B-14B | ✓ | ✓ | ✓* | optional (int8) | bf16 | ✓ | ✓ | |
| [Wan 2.x](/documentation/quickstart/WAN.md) | 1.3B-14B | ✓ | ✓ | ✓* | optional (int8) | bf16 | ✓ | ✓ | |
| [Qwen Image](/documentation/quickstart/QWEN_IMAGE.md) | 20B | ✓ | ✓ | ✓* | required (int8, nf4) | bf16 | ✓ (required) | ✓ | |

**Note:** The above table provides a simplified overview. For the complete and most accurate feature matrix with detailed specifications, please see the [main README.md](../README.md#model-architecture-support).

> ℹ️ The Wan quickstart covers 2.1 training plus the 2.2 high/low stage presets and the new time-embedding compatibility toggle.

> ⚠️ These tutorials are a work-in-progress. They contain full end-to-end instructions for a basic training session.
17 changes: 17 additions & 0 deletions documentation/quickstart/AURAFLOW.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,23 @@ Auraflow v0.3 was released as a 6B parameter MMDiT that uses Pile T5 for its enc

This model is somewhat slow for inference, but trains at a decent speed.

### Memory offloading (optional)

Auraflow benefits greatly from the new grouped offloading path. Add the following to your training flags if you are limited to a single 24G (or smaller) GPU:

```bash
--enable_group_offload \
--group_offload_type block_level \
--group_offload_blocks_per_group 1 \
--group_offload_use_stream \
# optional: spill offloaded weights to disk instead of RAM
# --group_offload_to_disk_path /fast-ssd/simpletuner-offload
```

- Streams are automatically disabled on non-CUDA backends, so the command is safe to reuse on ROCm and MPS.
- Do not combine this with `--enable_model_cpu_offload`.
- Disk offloading trades throughput for lower host RAM pressure; keep it on a local SSD for best results.

### Prerequisites

Make sure that you have python installed; SimpleTuner does well with 3.10 through 3.12.
Expand Down
17 changes: 17 additions & 0 deletions documentation/quickstart/COSMOS2IMAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,23 @@ Cosmos2 Predict (Image) is a vision transformer-based model that uses flow match

A 24GB GPU is recommended as the minimum for comfortable training without extensive optimizations.

### Memory offloading (optional)

To squeeze Cosmos2 into smaller GPUs, enable grouped offloading:

```bash
--enable_group_offload \
--group_offload_type block_level \
--group_offload_blocks_per_group 1 \
--group_offload_use_stream \
# optional: spill offloaded weights to disk instead of RAM
# --group_offload_to_disk_path /fast-ssd/simpletuner-offload
```

- Streams are only honoured on CUDA; other devices fall back automatically.
- Do not combine this with `--enable_model_cpu_offload`.
- Disk staging is optional and helps when system RAM is the bottleneck.

### Prerequisites

Make sure that you have python installed; SimpleTuner does well with 3.10 through 3.12.
Expand Down
17 changes: 17 additions & 0 deletions documentation/quickstart/FLUX.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,23 @@ Luckily, these are readily available through providers such as [LambdaLabs](http

**Unlike other models, Apple GPUs do not currently work for training Flux.**

### Memory offloading (optional)

Flux supports grouped module offloading via diffusers v0.33+. This dramatically reduces VRAM pressure when you are bottlenecked by the transformer weights. You can enable it by adding the following flags to `TRAINER_EXTRA_ARGS` (or the WebUI Hardware page):

```bash
--enable_group_offload \
--group_offload_type block_level \
--group_offload_blocks_per_group 1 \
--group_offload_use_stream \
# optional: spill offloaded weights to disk instead of RAM
# --group_offload_to_disk_path /fast-ssd/simpletuner-offload
```

- `--group_offload_use_stream` is only effective on CUDA devices; SimpleTuner automatically disables streams on ROCm, MPS and CPU backends.
- Do **not** combine this with `--enable_model_cpu_offload` — the two strategies are mutually exclusive.
- When using `--group_offload_to_disk_path`, prefer a fast local SSD/NVMe target.

## Prerequisites

Make sure that you have python installed; SimpleTuner does well with 3.10 through 3.12.
Expand Down
17 changes: 17 additions & 0 deletions documentation/quickstart/LTXVIDEO.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,23 @@ You'll need:

Apple silicon systems work great with LTX so far, albeit at a lower resolution due to limits inside the MPS backend used by Pytorch.

### Memory offloading (optional)

If you are close to the VRAM limit, enable grouped offloading in your config:

```bash
--enable_group_offload \
--group_offload_type block_level \
--group_offload_blocks_per_group 1 \
--group_offload_use_stream \
# optional: spill offloaded weights to disk instead of RAM
# --group_offload_to_disk_path /fast-ssd/simpletuner-offload
```

- CUDA users benefit from `--group_offload_use_stream`; other backends ignore it automatically.
- Skip `--group_offload_to_disk_path` unless system RAM is <64 GB — disk staging is slower but keeps runs stable.
- Disable `--enable_model_cpu_offload` when using group offloading.

### Prerequisites

Make sure that you have python installed; SimpleTuner does well with 3.10 through 3.12.
Expand Down
63 changes: 63 additions & 0 deletions documentation/quickstart/WAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,30 @@ Currently, image-to-video training is not supported for Wan, but T2V LoRA and Ly
- Resolution: 1280x720
-->

#### Image to Video (Wan 2.2)

Recent Wan 2.2 I2V checkpoints work with the same training flow:

- High stage: https://huggingface.co/Wan-AI/Wan2.2-I2V-14B-Diffusers/tree/main/high_noise_model
- Low stage: https://huggingface.co/Wan-AI/Wan2.2-I2V-14B-Diffusers/tree/main/low_noise_model

You can target the stage you want with the `model_flavour` and `wan_validation_load_other_stage` settings outlined later in this guide.

You'll need:
- **a realistic minimum** is 16GB or, a single 3090 or V100 GPU
- **ideally** multiple 4090, A6000, L40S, or better

If you encounter shape mismatches in the time embedding layers when running Wan 2.2 checkpoints, enable the new
`wan_force_2_1_time_embedding` flag. This forces the transformer to fall back to Wan 2.1 style time embeddings and
resolves the compatibility issue.

#### Stage presets & validation

- `model_flavour=i2v-14b-2.2-high` targets the Wan 2.2 high-noise stage.
- `model_flavour=i2v-14b-2.2-low` targets the low-noise stage (same checkpoints, different subfolder).
- Toggle `wan_validation_load_other_stage=true` to load the opposite stage alongside the one you train for validation renders.
- Leave the flavour unset (or use `t2v-480p-1.3b-2.1`) for the standard Wan 2.1 text-to-video run.

Apple silicon systems do not work super well with Wan 2.1 so far, something like 10 minutes for a single training step can be expected..

### Prerequisites
Expand Down Expand Up @@ -112,6 +132,23 @@ simpletuner configure

> ⚠️ For users located in countries where Hugging Face Hub is not readily accessible, you should add `HF_ENDPOINT=https://hf-mirror.com` to your `~/.bashrc` or `~/.zshrc` depending on which `$SHELL` your system uses.

### Memory offloading (optional)

Wan is one of the heaviest models SimpleTuner supports. Enable grouped offloading if you are close to the VRAM ceiling:

```bash
--enable_group_offload \
--group_offload_type block_level \
--group_offload_blocks_per_group 1 \
--group_offload_use_stream \
# optional: spill offloaded weights to disk instead of RAM
# --group_offload_to_disk_path /fast-ssd/simpletuner-offload
```

- Only CUDA devices honour `--group_offload_use_stream`; ROCm/MPS fall back automatically.
- Leave disk staging commented out unless CPU memory is the bottleneck.
- `--enable_model_cpu_offload` is mutually exclusive with group offload.


If you prefer to manually configure:

Expand Down Expand Up @@ -432,6 +469,30 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
]
```

- Wan 2.2 image-to-video runs create CLIP conditioning caches. In the **video** dataset entry, point at a dedicated backend and (optionally) override the cache path:

```json
{
"id": "disney-black-and-white",
"type": "local",
"dataset_type": "video",
"conditioning_image_embeds": "disney-conditioning",
"cache_dir_conditioning_image_embeds": "cache/conditioning_image_embeds/disney-black-and-white"
}
```

- Define the conditioning backend once and reuse it across datasets if needed (full object shown here for clarity):

```json
{
"id": "disney-conditioning",
"type": "local",
"dataset_type": "conditioning_image_embeds",
"cache_dir": "cache/conditioning_image_embeds/disney-conditioning",
"disabled": false
}
```

- In the `video` subsection, we have the following keys we can set:
- `num_frames` (optional, int) is how many seconds of data we'll train on.
- At 15 fps, 75 frames is 5 seconds of video, standard output. This should be your target.
Expand Down Expand Up @@ -488,6 +549,8 @@ simpletuner train
simpletuner train
```

> ℹ️ Append `--model_flavour i2v-14b-2.2-high` (or `low`) and, if desired, `--wan_validation_load_other_stage` inside `TRAINER_EXTRA_ARGS` or your CLI invocation when you train Wan 2.2. Add `--wan_force_2_1_time_embedding` only when the checkpoint reports a time-embedding shape mismatch.

**Option 3 (Legacy method - still works):**
```bash
./train.sh
Expand Down
8 changes: 3 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,7 @@ def build_rocm_wheel_url(package: str, version: str, rocm_version: str) -> str:
py_tag = f"cp{sys.version_info.major}{sys.version_info.minor}"
platform_tag = _rocm_platform_tag()
filename = f"{package}-{version}%2Brocm{rocm_version}-{py_tag}-{py_tag}-{platform_tag}.whl"
base_url = os.environ.get(
"SIMPLETUNER_ROCM_BASE_URL", f"https://download.pytorch.org/whl/rocm{rocm_version}"
)
base_url = os.environ.get("SIMPLETUNER_ROCM_BASE_URL", f"https://download.pytorch.org/whl/rocm{rocm_version}")
return f"{package} @ {base_url}/{filename}"


Expand All @@ -86,6 +84,7 @@ def get_cuda_dependencies():
"torchao>=0.12.0",
"nvidia-cudnn-cu12",
"nvidia-nccl-cu12",
"nvidia-ml-py>=12.555",
"lm-eval>=0.4.4",
]

Expand Down Expand Up @@ -183,7 +182,7 @@ def _collect_package_files(*directories: str):
"wandb>=0.21.0",
"requests>=2.32.4",
"pillow>=11.3.0",
"trainingsample>=0.2.1",
"trainingsample>=0.2.10",
"accelerate>=1.5.2",
"safetensors>=0.5.3",
"compel>=2.1.1",
Expand Down Expand Up @@ -218,7 +217,6 @@ def _collect_package_files(*directories: str):
"imageio[pyav]>=2.37.0",
"hf-xet>=1.1.5",
"peft-singlora>=0.2.0",
"trainingsample>=0.2.1",
"cryptography>=41.0.0",
]

Expand Down
15 changes: 9 additions & 6 deletions simpletuner/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
from pathlib import Path
from typing import List, Optional

from simpletuner.simpletuner_sdk.server.utils.paths import get_config_directory, get_template_directory


def find_config_file() -> Optional[str]:
"""Find config file in current directory or config/ subdirectory."""
Expand Down Expand Up @@ -609,6 +611,13 @@ def cmd_server(args) -> int:
os.environ["SIMPLETUNER_SSL_KEYFILE"] = ssl_config["keyfile"]
os.environ["SIMPLETUNER_SSL_CERTFILE"] = ssl_config["certfile"]

# Ensure template resolution points to packaged templates unless overridden
os.environ.setdefault("TEMPLATE_DIR", str(get_template_directory()))

# Ensure a configuration directory exists and record it for downstream services
config_dir = get_config_directory()
os.environ.setdefault("SIMPLETUNER_CONFIG_DIR", str(config_dir))

try:
import uvicorn

Expand All @@ -622,12 +631,6 @@ def cmd_server(args) -> int:
# Create app with specified mode
app = create_app(mode=server_mode, ssl_no_verify=ssl_no_verify)

# Create necessary directories
os.makedirs("static/css", exist_ok=True)
os.makedirs("static/js", exist_ok=True)
os.makedirs("templates", exist_ok=True)
os.makedirs("configs", exist_ok=True)

# Configure uvicorn SSL
uvicorn_config = {"app": app, "host": host, "port": port, "reload": reload, "log_level": "info"}

Expand Down
Loading