lightly-ai · guarin · Nov 19, 2025 · Nov 19, 2025 · Nov 19, 2025 · Nov 19, 2025
diff --git a/README.md b/README.md
diff --git a/docs/source/index.md b/docs/source/index.md
diff --git a/docs/source/instance_segmentation.md b/docs/source/instance_segmentation.md
@@ -2,6 +2,8 @@
 
 # Instance Segmentation
 
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/eomt_instance_segmentation.ipynb)
+
 ```{note}
 🔥 LightlyTrain now supports training **DINOv3**-based instance segmentation models
 with the [EoMT architecture](https://arxiv.org/abs/2503.19108) by Kerssies et al.!
@@ -21,12 +23,12 @@ You can also explore running inference and training these models using our Colab
 
 ### COCO
 
-| Implementation | Model | #Params (M) | Input Size | Val mAP mask | Avg. FPS |
-|----------------|----------------|-------------|------------|----------|----------|
-| LightlyTrain | dinov3/vits16-eomt-inst-coco | 21.6 | 640x640 | 32.6 | 51.5 |
-| LightlyTrain | dinov3/vitb16-eomt-inst-coco | 85.7 | 640x640 | 40.3 | 25.2 |
-| LightlyTrain | dinov3/vitl16-eomt-inst-coco | 303.2 | 640x640 | **46.2** | 12.5 |
-| Original EoMT | dinov3/vitl16-eomt-inst-coco | 303.2 | 640x640 | 45.9 | - |
+| Implementation | Model | Val mAP mask | Avg. FPS | Params (M) | Input Size |
+|----------------|----------------|-------------|----------|-----------|------------|
+| LightlyTrain | dinov3/vits16-eomt-inst-coco | 32.6 | 51.5 | 21.6 | 640×640 |
+| LightlyTrain | dinov3/vitb16-eomt-inst-coco | 40.3 | 25.2 | 85.7 | 640×640 |
+| LightlyTrain | dinov3/vitl16-eomt-inst-coco | **46.2** | 12.5 | 303.2 | 640×640 |
+| Original EoMT | dinov3/vitl16-eomt-inst-coco | 45.9 | - | 303.2 | 640×640 |
 
 Training follows the protocol in the original [EoMT paper](https://arxiv.org/abs/2503.19108).
 Models are trained for 90K steps (~12 epochs) on the COCO dataset with batch size `16`

diff --git a/docs/source/methods/dinov2.md b/docs/source/methods/dinov2.md
@@ -10,8 +10,8 @@ generate high-quality features that can be used without fine-tuning the model.
 
 ```{table}
 
-| Implementation | Model | ImageNet k-NN |
-|--------------|----------|---------------|
+| Implementation | Model | Val ImageNet k-NN |
+|--------------|----------|-------------------|
 | LightlyTrain | ViT-L/16 | 81.9% |
 | [Official](https://github.com/facebookresearch/dinov2) | ViT-L/16 | 81.6% |
 

diff --git a/docs/source/methods/distillation.md b/docs/source/methods/distillation.md
@@ -2,6 +2,8 @@
 
 # Distillation (recommended 🚀)
 
+[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/quick_start.ipynb)
+
 Knowledge distillation involves transferring knowledge from a large, compute-intensive teacher model to a smaller, efficient student model by encouraging similarity between the student and teacher representations. It addresses the challenge of bridging the gap between state-of-the-art large-scale vision models and smaller, more computationally efficient models suitable for practical applications.
 
 ```{note}
@@ -12,6 +14,8 @@ that achieves higher accuracy and trains up to 3x faster. The previous version i
 
 ## Use Distillation in LightlyTrain
 
+[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/quick_start.ipynb)
+
 Follow the code below to distill the knowledge of the default DINOv2 ViT-B/14 teacher model into your model architecture. The example uses a `torchvision/resnet18` model as the student:
 
 ````{tab} Python

diff --git a/docs/source/object_detection.md b/docs/source/object_detection.md
@@ -2,30 +2,36 @@
 
 # Object Detection
 
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/object_detection.ipynb)
+
 ```{note}
-🔥 LightlyTrain now supports training **LT-DETR**: **DINOv3**- and **DINOv2**-based object detection models
+🔥 LightlyTrain now supports training **LTDETR**: **DINOv3**- and **DINOv2**-based object detection models
 with the super fast RT-DETR detection architecture! Our largest model achieves an mAP<sub>50:95</sub> of 60.0 on the COCO validation set!
 ```
 
 (object-detection-benchmark-results)=
 
 ## Benchmark Results
 
-Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub> and inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset. You can check [here](object-detection-use-model-weights) for how to use these model checkpoints for further fine-tuning. The average FPS values were measured using TensorRT in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
+Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub> and
+inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset.
+You can check [here](object-detection-use-model-weights) for how to use these model
+checkpoints for further fine-tuning. The average FPS values were measured using TensorRT
+in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
-Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub> and
-inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset.
-You can check [here](object-detection-use-model-weights) for how to use these model
-checkpoints for further fine-tuning. The average FPS values were measured using TensorRT
-in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
+Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub>
+and inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO
+dataset. You can check [here](object-detection-use-model-weights) for how to use these
+model checkpoints for further fine-tuning. The average FPS values were measured using
+TensorRT in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
-Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub> and
-inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO dataset.
-You can check [here](object-detection-use-model-weights) for how to use these model
-checkpoints for further fine-tuning. The average FPS values were measured using TensorRT
-in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
+Below we provide the model checkpoints and report the validation mAP<sub>50:95</sub>
+and inference FPS of different DINOv3 and DINOv2-based models, fine-tuned on the COCO
+dataset. You can check [here](object-detection-use-model-weights) for how to use these
+model checkpoints for further fine-tuning. The average FPS values were measured using
+TensorRT in the version `10.13.3.9` and on a Nvidia T4 GPU with batch size 1.
 
 <!-- TODO (Lionel, 10/25) Add Notebook for OD. -->
 
 ### COCO
 
-| Implementation | Backbone Model | AP<sub>50:95</sub> | Latency (ms) | # Params (M) | Input Size | Checkpoint Name |
-|:--------------:|:----------------------------:|:------------------:|:------------:|:------------:|:----------:|:---------------------------------:|
-| LightlyTrain | dinov2/vits14-ltdetr | 55.7 | 16.87 | 55.3 | 644×644 | dinov2/vits14-noreg-ltdetr-coco |
-| LightlyTrain | dinov3/convnext-tiny-ltdetr | 54.4 | 13.29 | 61.1 | 640×640 | dinov3/convnext-tiny-ltdetr-coco |
-| LightlyTrain | dinov3/convnext-small-ltdetr | 56.9 | 17.65 | 82.7 | 640×640 | dinov3/convnext-small-ltdetr-coco |
-| LightlyTrain | dinov3/convnext-base-ltdetr | 58.6 | 24.68 | 121.0 | 640×640 | dinov3/convnext-base-ltdetr-coco |
-| LightlyTrain | dinov3/convnext-large-ltdetr | 60.0 | 42.30 | 230.0 | 640×640 | dinov3/convnext-large-ltdetr-coco |
+| Implementation | Model | Val mAP<sub>50:95</sub> | Latency (ms) | Params (M) | Input Size |
+|:--------------:|:----------------------------:|:------------------:|:------------:|:-----------:|:----------:|
+| LightlyTrain | dinov2/vits14-ltdetr | 55.7 | 16.87 | 55.3 | 644×644 |
+| LightlyTrain | dinov3/convnext-tiny-ltdetr-coco | 54.4 | 13.29 | 61.1 | 640×640 |
+| LightlyTrain | dinov3/convnext-small-ltdetr-coco | 56.9 | 17.65 | 82.7 | 640×640 |
+| LightlyTrain | dinov3/convnext-base-ltdetr-coco | 58.6 | 24.68 | 121.0 | 640×640 |
+| LightlyTrain | dinov3/convnext-large-ltdetr-coco | 60.0 | 42.30 | 230.0 | 640×640 |
 
-## Object Detection with LT-DETR
+## Object Detection with LTDETR
 
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/object_detection.ipynb)
 

diff --git a/docs/source/predict_autolabel.md b/docs/source/predict_autolabel.md
@@ -18,14 +18,18 @@ The pseudo masks were generated in the following way:
 
 The validation results are listed in the table below, where you can notice significant improvements when using the auto-labeled data:
 
-| Implementation | Model Name | Autolabel | Val mIoU | # Params (M) | Input Size | Checkpoint Name |
-|:--------------:|:------------------:|:------:|:--------------------:|:------------:|:----------:| :----------------:|
-| LightlyTrain | dinov3/vits16-eomt | ❌ | 0.466 | 21.6 | 518×518 | |
-| LightlyTrain | dinov3/vits16-eomt | ✅ | **0.533** | 21.6 | 518×518 | dinov3/vits16-eomt-ade20k |
-| LightlyTrain | dinov3/vitb16-eomt | ❌ | 0.544 | 85.7 | 518×518 | |
-| LightlyTrain | dinov3/vitb16-eomt-ade20k | ✅ | **0.573** | 85.7 | 518×518 | dinov3/vitb16-eomt-ade20k |
-
-We also released the model checkpoints fine-tuned with auto-labeled SUN397 dataset in the table above. You can use these checkpoints by specifying the checkpoint name in the `model` argument of the `predict_semantic_segmentation` function. See the [Predict Semantic Segmentation Masks](#predict-semantic-segmentation) section below for more details.
+| Implementation | Model | Autolabel | Val mIoU | Params (M) | Input Size |
+|:--------------:|:-------------------------------:|:---------:|:---------:|:-----------:|:----------:|
+| LightlyTrain | dinov3/vits16-eomt | ❌ | 0.466 | 21.6 | 518×518 |
+| LightlyTrain | dinov3/vits16-eomt-ade20k | ✅ | **0.533** | 21.6 | 518×518 |
+| LightlyTrain | dinov3/vitb16-eomt | ❌ | 0.544 | 85.7 | 518×518 |
+| LightlyTrain | dinov3/vitb16-eomt-ade20k | ✅ | **0.573** | 85.7 | 518×518 |
+
+We also released the model checkpoints fine-tuned with auto-labeled SUN397 dataset in
+the table above. You can use these checkpoints by specifying the corresponding model
+name in the `model` argument of the `predict_semantic_segmentation` function. See the
+[Predict Semantic Segmentation Masks](#predict-semantic-segmentation) section below
+for more details.
 
 ## Predict Model Checkpoint
 

diff --git a/docs/source/semantic_segmentation.md b/docs/source/semantic_segmentation.md
@@ -2,6 +2,8 @@
 
 # Semantic Segmentation
 
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/eomt_semantic_segmentation.ipynb)
+
 ```{note}
 🔥 **New**: LightlyTrain now supports training **[DINOv3](#-use-eomt-with-dinov3-)** and DINOv2 models for semantic segmentation with the `train_semantic_segmentation` function! The method is based on the
 state-of-the-art segmentation model [EoMT](https://arxiv.org/abs/2503.19108) by
@@ -22,27 +24,29 @@ You can also explore inferencing with these model weights using our Colab notebo
 
 ### COCO-Stuff
 
-| Backbone Model | #Params (M) | Input Size | Val mIoU | Avg. FPS | Checkpoint |
-|----------------|-------------|------------|----------|----------|------------|
-| dinov3/vits16-eomt | 21.6 | 512×512 | 0.465 | 88.7 | dinov3/vits16-eomt-coco |
-| dinov3/vitb16-eomt | 85.7 | 512×512 | 0.520 | 43.3 | dinov3/vitb16-eomt-coco |
-| dinov3/vitl16-eomt | 303.2 | 512×512 | **0.544** | 20.4 | dinov3/vitl16-eomt-coco |
+| Implementation | Model | Val mIoU | Avg. FPS | Params (M) | Input Size |
+|----------------|----------------------------|----------|----------|-----------|------------|
+| LightlyTrain | dinov3/vits16-eomt-coco | 0.465 | 88.7 | 21.6 | 512×512 |
+| LightlyTrain | dinov3/vitb16-eomt-coco | 0.520 | 43.3 | 85.7 | 512×512 |
+| LightlyTrain | dinov3/vitl16-eomt-coco | **0.544** | 20.4 | 303.2 | 512×512 |
 
 We trained with 12 epochs (~88k steps) on the COCO-Stuff dataset with `num_queries=200` for EoMT.
 
 ### Cityscapes
 
-| Backbone Model | #Params (M) | Input Size | Val mIoU | Avg. FPS | Checkpoint |
-|----------------|-------------|------------|----------|----------|------------|
-| dinov3/vits16-eomt | 21.6 | 1024×1024 | 0.786 | 18.6 | dinov3/vits16-eomt-cityscapes |
-| dinov3/vitb16-eomt | 85.7 | 1024×1024 | 0.810 | 8.7 | dinov3/vitb16-eomt-cityscapes |
-| dinov3/vitl16-eomt | 303.2 | 1024×1024 | **0.844** | 3.9 | dinov3/vitl16-eomt-cityscapes |
-| dinov2/vitl16-eomt (original) | 319 | 1024×1024 | 0.842 | - | - |
+| Implementation | Model | Val mIoU | Avg. FPS | Params (M) | Input Size |
+|----------------|--------------------------------------|----------|----------|-----------|------------|
+| LightlyTrain | dinov3/vits16-eomt-cityscapes | 0.786 | 18.6 | 21.6 | 1024×1024 |
+| LightlyTrain | dinov3/vitb16-eomt-cityscapes | 0.810 | 8.7 | 85.7 | 1024×1024 |
+| LightlyTrain | dinov3/vitl16-eomt-cityscapes | **0.844** | 3.9 | 303.2 | 1024×1024 |
+| Original EoMT | dinov2/vitl16-eomt | 0.842 | - | 319 | 1024×1024 |
 
 We trained with 107 epochs (~20k steps) on the Cityscapes dataset with `num_queries=200` for EoMT.
 
 ## Semantic Segmentation with EoMT
 
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/eomt_semantic_segmentation.ipynb)
+
 Training a semantic segmentation model with LightlyTrain is straightforward and
 only requires a few lines of code. See [data](#semantic-segmentation-data)
 for more details on how to prepare your dataset.
@@ -176,7 +180,8 @@ if __name__ == "__main__":
 ### Use the LightlyTrain Model Checkpoints
 
 Now you can also start with the DINOv3 model checkpoints that LightlyTrain provides.
-The models are listed [here](#semantic-segmentation-benchmark-results) in the "Checkpoint" column of the tables.
+The models are listed [here](#semantic-segmentation-benchmark-results) in the
+"Model" column of the tables.
 
 ```python
 import lightly_train

diff --git a/docs/source/train/index.md b/docs/source/train/index.md
@@ -2,6 +2,8 @@
 
 # Train
 
+[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/quick_start.ipynb)
+
 The train command is a simple interface to pretrain a large number of models using
 different SSL methods. An example command looks like this:
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,6 +2,8 @@ @@
     # Train
+    [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/quick_start.ipynb)
     The train command is a simple interface to pretrain a large number of models using
     different SSL methods. An example command looks like this:
@@ Expand Down @@