Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
508 changes: 234 additions & 274 deletions README.md

Large diffs are not rendered by default.

248 changes: 102 additions & 146 deletions docs/source/index.md

Large diffs are not rendered by default.

14 changes: 8 additions & 6 deletions docs/source/instance_segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

# Instance Segmentation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/eomt_instance_segmentation.ipynb)

```{note}
🔥 LightlyTrain now supports training **DINOv3**-based instance segmentation models
with the [EoMT architecture](https://arxiv.org/abs/2503.19108) by Kerssies et al.!
Expand All @@ -21,12 +23,12 @@ You can also explore running inference and training these models using our Colab

### COCO

| Implementation | Model | #Params (M) | Input Size | Val mAP mask | Avg. FPS |
|----------------|----------------|-------------|------------|----------|----------|
| LightlyTrain | dinov3/vits16-eomt-inst-coco | 21.6 | 640x640 | 32.6 | 51.5 |
| LightlyTrain | dinov3/vitb16-eomt-inst-coco | 85.7 | 640x640 | 40.3 | 25.2 |
| LightlyTrain | dinov3/vitl16-eomt-inst-coco | 303.2 | 640x640 | **46.2** | 12.5 |
| Original EoMT | dinov3/vitl16-eomt-inst-coco | 303.2 | 640x640 | 45.9 | - |
| Implementation | Model | Val mAP mask | Avg. FPS | Params (M) | Input Size |
|----------------|----------------|-------------|----------|-----------|------------|
| LightlyTrain | dinov3/vits16-eomt-inst-coco | 32.6 | 51.5 | 21.6 | 640×640 |
| LightlyTrain | dinov3/vitb16-eomt-inst-coco | 40.3 | 25.2 | 85.7 | 640×640 |
| LightlyTrain | dinov3/vitl16-eomt-inst-coco | **46.2** | 12.5 | 303.2 | 640×640 |
| Original EoMT | dinov3/vitl16-eomt-inst-coco | 45.9 | - | 303.2 | 640×640 |

Training follows the protocol in the original [EoMT paper](https://arxiv.org/abs/2503.19108).
Models are trained for 90K steps (~12 epochs) on the COCO dataset with batch size `16`
Expand Down
4 changes: 2 additions & 2 deletions docs/source/methods/dinov2.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ generate high-quality features that can be used without fine-tuning the model.

```{table}

| Implementation | Model | ImageNet k-NN |
|--------------|----------|---------------|
| Implementation | Model | Val ImageNet k-NN |
|--------------|----------|-------------------|
| LightlyTrain | ViT-L/16 | 81.9% |
| [Official](https://github.com/facebookresearch/dinov2) | ViT-L/16 | 81.6% |

Expand Down
4 changes: 4 additions & 0 deletions docs/source/methods/distillation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

# Distillation (recommended 🚀)

[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/quick_start.ipynb)

Knowledge distillation involves transferring knowledge from a large, compute-intensive teacher model to a smaller, efficient student model by encouraging similarity between the student and teacher representations. It addresses the challenge of bridging the gap between state-of-the-art large-scale vision models and smaller, more computationally efficient models suitable for practical applications.

```{note}
Expand All @@ -12,6 +14,8 @@ that achieves higher accuracy and trains up to 3x faster. The previous version i

## Use Distillation in LightlyTrain

[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/quick_start.ipynb)

Follow the code below to distill the knowledge of the default DINOv2 ViT-B/14 teacher model into your model architecture. The example uses a `torchvision/resnet18` model as the student:

````{tab} Python
Expand Down
26 changes: 16 additions & 10 deletions docs/source/object_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,36 @@

# Object Detection

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/object_detection.ipynb)

```{note}
🔥 LightlyTrain now supports training **LT-DETR**: **DINOv3**- and **DINOv2**-based object detection models
🔥 LightlyTrain now supports training **LTDETR**: **DINOv3**- and **DINOv2**-based object detection models
with the super fast RT-DETR detection architecture! Our largest model achieves an mAP<sub>50:95</sub> of 60.0 on the COCO validation set!
```

(object-detection-benchmark-results)=

## Benchmark Results

Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub> and inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset. You can check [here](object-detection-use-model-weights) for how to use these model checkpoints for further fine-tuning. The average FPS values were measured using TensorRT in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub> and
inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset.
You can check [here](object-detection-use-model-weights) for how to use these model
checkpoints for further fine-tuning. The average FPS values were measured using TensorRT
in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
Comment on lines +16 to +20
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 16-20 exceed the 88-character limit specified in the Markdown documentation style guidelines. These lines should be wrapped at 88 characters. For example, line 16 is 91 characters, line 17 is 93 characters, line 18 is 91 characters, and line 19 is 94 characters.

Suggested change
Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub> and
inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset.
You can check [here](object-detection-use-model-weights) for how to use these model
checkpoints for further fine-tuning. The average FPS values were measured using TensorRT
in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub>
and inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO
dataset. You can check [here](object-detection-use-model-weights) for how to use these
model checkpoints for further fine-tuning. The average FPS values were measured using
TensorRT in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.

Copilot uses AI. Check for mistakes.

<!-- TODO (Lionel, 10/25) Add Notebook for OD. -->

### COCO

| Implementation | Backbone Model | AP<sub>50:95</sub> | Latency (ms) | # Params (M) | Input Size | Checkpoint Name |
|:--------------:|:----------------------------:|:------------------:|:------------:|:------------:|:----------:|:---------------------------------:|
| LightlyTrain | dinov2/vits14-ltdetr | 55.7 | 16.87 | 55.3 | 644×644 | dinov2/vits14-noreg-ltdetr-coco |
| LightlyTrain | dinov3/convnext-tiny-ltdetr | 54.4 | 13.29 | 61.1 | 640×640 | dinov3/convnext-tiny-ltdetr-coco |
| LightlyTrain | dinov3/convnext-small-ltdetr | 56.9 | 17.65 | 82.7 | 640×640 | dinov3/convnext-small-ltdetr-coco |
| LightlyTrain | dinov3/convnext-base-ltdetr | 58.6 | 24.68 | 121.0 | 640×640 | dinov3/convnext-base-ltdetr-coco |
| LightlyTrain | dinov3/convnext-large-ltdetr | 60.0 | 42.30 | 230.0 | 640×640 | dinov3/convnext-large-ltdetr-coco |
| Implementation | Model | Val mAP<sub>50:95</sub> | Latency (ms) | Params (M) | Input Size |
|:--------------:|:----------------------------:|:------------------:|:------------:|:-----------:|:----------:|
| LightlyTrain | dinov2/vits14-ltdetr | 55.7 | 16.87 | 55.3 | 644×644 |
| LightlyTrain | dinov3/convnext-tiny-ltdetr-coco | 54.4 | 13.29 | 61.1 | 640×640 |
| LightlyTrain | dinov3/convnext-small-ltdetr-coco | 56.9 | 17.65 | 82.7 | 640×640 |
| LightlyTrain | dinov3/convnext-base-ltdetr-coco | 58.6 | 24.68 | 121.0 | 640×640 |
| LightlyTrain | dinov3/convnext-large-ltdetr-coco | 60.0 | 42.30 | 230.0 | 640×640 |

## Object Detection with LT-DETR
## Object Detection with LTDETR

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/object_detection.ipynb)

Expand Down
20 changes: 12 additions & 8 deletions docs/source/predict_autolabel.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,18 @@ The pseudo masks were generated in the following way:

The validation results are listed in the table below, where you can notice significant improvements when using the auto-labeled data:

| Implementation | Model Name | Autolabel | Val mIoU | # Params (M) | Input Size | Checkpoint Name |
|:--------------:|:------------------:|:------:|:--------------------:|:------------:|:----------:| :----------------:|
| LightlyTrain | dinov3/vits16-eomt | ❌ | 0.466 | 21.6 | 518×518 | |
| LightlyTrain | dinov3/vits16-eomt | ✅ | **0.533** | 21.6 | 518×518 | dinov3/vits16-eomt-ade20k |
| LightlyTrain | dinov3/vitb16-eomt | ❌ | 0.544 | 85.7 | 518×518 | |
| LightlyTrain | dinov3/vitb16-eomt-ade20k | ✅ | **0.573** | 85.7 | 518×518 | dinov3/vitb16-eomt-ade20k |

We also released the model checkpoints fine-tuned with auto-labeled SUN397 dataset in the table above. You can use these checkpoints by specifying the checkpoint name in the `model` argument of the `predict_semantic_segmentation` function. See the [Predict Semantic Segmentation Masks](#predict-semantic-segmentation) section below for more details.
| Implementation | Model | Autolabel | Val mIoU | Params (M) | Input Size |
|:--------------:|:-------------------------------:|:---------:|:---------:|:-----------:|:----------:|
| LightlyTrain | dinov3/vits16-eomt | ❌ | 0.466 | 21.6 | 518×518 |
| LightlyTrain | dinov3/vits16-eomt-ade20k | ✅ | **0.533** | 21.6 | 518×518 |
| LightlyTrain | dinov3/vitb16-eomt | ❌ | 0.544 | 85.7 | 518×518 |
| LightlyTrain | dinov3/vitb16-eomt-ade20k | ✅ | **0.573** | 85.7 | 518×518 |

We also released the model checkpoints fine-tuned with auto-labeled SUN397 dataset in
the table above. You can use these checkpoints by specifying the corresponding model
name in the `model` argument of the `predict_semantic_segmentation` function. See the
[Predict Semantic Segmentation Masks](#predict-semantic-segmentation) section below
for more details.

## Predict Model Checkpoint

Expand Down
29 changes: 17 additions & 12 deletions docs/source/semantic_segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

# Semantic Segmentation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/eomt_semantic_segmentation.ipynb)

```{note}
🔥 **New**: LightlyTrain now supports training **[DINOv3](#-use-eomt-with-dinov3-)** and DINOv2 models for semantic segmentation with the `train_semantic_segmentation` function! The method is based on the
state-of-the-art segmentation model [EoMT](https://arxiv.org/abs/2503.19108) by
Expand All @@ -22,27 +24,29 @@ You can also explore inferencing with these model weights using our Colab notebo

### COCO-Stuff

| Backbone Model | #Params (M) | Input Size | Val mIoU | Avg. FPS | Checkpoint |
|----------------|-------------|------------|----------|----------|------------|
| dinov3/vits16-eomt | 21.6 | 512×512 | 0.465 | 88.7 | dinov3/vits16-eomt-coco |
| dinov3/vitb16-eomt | 85.7 | 512×512 | 0.520 | 43.3 | dinov3/vitb16-eomt-coco |
| dinov3/vitl16-eomt | 303.2 | 512×512 | **0.544** | 20.4 | dinov3/vitl16-eomt-coco |
| Implementation | Model | Val mIoU | Avg. FPS | Params (M) | Input Size |
|----------------|----------------------------|----------|----------|-----------|------------|
| LightlyTrain | dinov3/vits16-eomt-coco | 0.465 | 88.7 | 21.6 | 512×512 |
| LightlyTrain | dinov3/vitb16-eomt-coco | 0.520 | 43.3 | 85.7 | 512×512 |
| LightlyTrain | dinov3/vitl16-eomt-coco | **0.544** | 20.4 | 303.2 | 512×512 |

We trained with 12 epochs (~88k steps) on the COCO-Stuff dataset with `num_queries=200` for EoMT.

### Cityscapes

| Backbone Model | #Params (M) | Input Size | Val mIoU | Avg. FPS | Checkpoint |
|----------------|-------------|------------|----------|----------|------------|
| dinov3/vits16-eomt | 21.6 | 1024×1024 | 0.786 | 18.6 | dinov3/vits16-eomt-cityscapes |
| dinov3/vitb16-eomt | 85.7 | 1024×1024 | 0.810 | 8.7 | dinov3/vitb16-eomt-cityscapes |
| dinov3/vitl16-eomt | 303.2 | 1024×1024 | **0.844** | 3.9 | dinov3/vitl16-eomt-cityscapes |
| dinov2/vitl16-eomt (original) | 319 | 1024×1024 | 0.842 | - | - |
| Implementation | Model | Val mIoU | Avg. FPS | Params (M) | Input Size |
|----------------|--------------------------------------|----------|----------|-----------|------------|
| LightlyTrain | dinov3/vits16-eomt-cityscapes | 0.786 | 18.6 | 21.6 | 1024×1024 |
| LightlyTrain | dinov3/vitb16-eomt-cityscapes | 0.810 | 8.7 | 85.7 | 1024×1024 |
| LightlyTrain | dinov3/vitl16-eomt-cityscapes | **0.844** | 3.9 | 303.2 | 1024×1024 |
| Original EoMT | dinov2/vitl16-eomt | 0.842 | - | 319 | 1024×1024 |

We trained with 107 epochs (~20k steps) on the Cityscapes dataset with `num_queries=200` for EoMT.

## Semantic Segmentation with EoMT

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/eomt_semantic_segmentation.ipynb)

Training a semantic segmentation model with LightlyTrain is straightforward and
only requires a few lines of code. See [data](#semantic-segmentation-data)
for more details on how to prepare your dataset.
Expand Down Expand Up @@ -176,7 +180,8 @@ if __name__ == "__main__":
### Use the LightlyTrain Model Checkpoints

Now you can also start with the DINOv3 model checkpoints that LightlyTrain provides.
The models are listed [here](#semantic-segmentation-benchmark-results) in the "Checkpoint" column of the tables.
The models are listed [here](#semantic-segmentation-benchmark-results) in the
"Model" column of the tables.

```python
import lightly_train
Expand Down
2 changes: 2 additions & 0 deletions docs/source/train/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

# Train

[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/quick_start.ipynb)

The train command is a simple interface to pretrain a large number of models using
different SSL methods. An example command looks like this:

Expand Down