Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ All notable changes to this project will be documented in this file.
- Add ONNX metadata to detection, instance segmantation, and segmentation models (<https://github.com/openvinotoolkit/training_extensions/pull/2418>)
- Add a new feature to configure input size(<https://github.com/openvinotoolkit/training_extensions/pull/2420>)
- Introduce the OTXSampler and AdaptiveRepeatDataHook to achieve faster training at the small data regime (<https://github.com/openvinotoolkit/training_extensions/pull/2428>)
- Add a new object detector Lite-DINO(<https://github.com/openvinotoolkit/training_extensions/pull/2457>)

### Enhancements

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ In addition to these models, we supports experimental models for object detectio
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Custom_Object_Detection_Gen3_DINO <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/resnet50_dino/template_experimental.yaml>`_ | DINO | 235 | 182.0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Custom_Object_Detection_Gen3_Lite_DINO <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/resnet50_litedino/template_experimental.yaml>`_ | Lite-DINO | 140 | 190.0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Custom_Object_Detection_Gen3_ResNeXt101_ATSS <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/resnext101_atss/template_experimental.yaml>`_ | ResNeXt101-ATSS | 434.75 | 344.0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Object_Detection_YOLOX_S <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/cspdarknet_yolox_s/template_experimental.yaml>`_ | YOLOX_S | 33.51 | 46.0 |
Expand All @@ -110,6 +112,7 @@ In addition to these models, we supports experimental models for object detectio
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+

`Deformable_DETR <https://arxiv.org/abs/2010.04159>`_ is `DETR <https://arxiv.org/abs/2005.12872>`_ based model, and it solves slow convergence problem of DETR. `DINO <https://arxiv.org/abs/2203.03605>`_ improves Deformable DETR based methods via denoising anchor boxes. Current SOTA models for object detection are based on DINO.
`Lite-DINO <https://arxiv.org/abs/2303.07335>`_ is efficient structure for DINO. It reduces FLOPS of transformer's encoder which takes the highest computational costs.
Although transformer based models show notable performance on various object detection benchmark, CNN based model still show good performance with proper latency.
Therefore, we added a new experimental CNN based method, ResNeXt101-ATSS. ATSS still shows good performance among `RetinaNet <https://arxiv.org/abs/1708.02002>`_ based models. We integrated large ResNeXt101 backbone to our Custom ATSS head, and it shows good transfer learning performance.
In addition, we added a YOLOX variants to support users' diverse situations.
Expand Down Expand Up @@ -154,6 +157,8 @@ We trained each model with a single Nvidia GeForce RTX3090.
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| ResNet50-DINO | 49.0 (66.4) | 47.2 | 99.5 | 62.9 | 93.5 | 99.1 |
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| ResNet50-Lite-DINO | 48.1 (64.4) | 47.0 | 99.0 | 62.5 | 93.6 | 99.4 |
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| YOLOX_S | 40.3 (59.1) | 37.1 | 93.6 | 54.8 | 92.7 | 98.8 |
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| YOLOX_L | 49.4 (67.1) | 44.5 | 94.6 | 55.8 | 91.8 | 99.0 |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ def _custom_grid_sample(im: torch.Tensor, grid: torch.Tensor, align_corners: boo
Returns:
torch.Tensor: A tensor with sampled points, shape (N, C, Hg, Wg)
"""
device = im.device
n, c, h, w = im.shape
gn, gh, gw, _ = grid.shape
assert n == gn
Expand Down Expand Up @@ -113,14 +114,14 @@ def _custom_grid_sample(im: torch.Tensor, grid: torch.Tensor, align_corners: boo
x0, x1, y0, y1 = x0 + 1, x1 + 1, y0 + 1, y1 + 1

# Clip coordinates to padded image size
x0 = torch.where(x0 < 0, torch.tensor(0), x0)
x0 = torch.where(x0 > padded_w - 1, torch.tensor(padded_w - 1), x0)
x1 = torch.where(x1 < 0, torch.tensor(0), x1)
x1 = torch.where(x1 > padded_w - 1, torch.tensor(padded_w - 1), x1)
y0 = torch.where(y0 < 0, torch.tensor(0), y0)
y0 = torch.where(y0 > padded_h - 1, torch.tensor(padded_h - 1), y0)
y1 = torch.where(y1 < 0, torch.tensor(0), y1)
y1 = torch.where(y1 > padded_h - 1, torch.tensor(padded_h - 1), y1)
x0 = torch.where(x0 < 0, torch.tensor(0).to(device), x0)
x0 = torch.where(x0 > padded_w - 1, torch.tensor(padded_w - 1).to(device), x0)
x1 = torch.where(x1 < 0, torch.tensor(0).to(device), x1)
x1 = torch.where(x1 > padded_w - 1, torch.tensor(padded_w - 1).to(device), x1)
y0 = torch.where(y0 < 0, torch.tensor(0).to(device), y0)
y0 = torch.where(y0 > padded_h - 1, torch.tensor(padded_h - 1).to(device), y0)
y1 = torch.where(y1 < 0, torch.tensor(0).to(device), y1)
y1 = torch.where(y1 > padded_h - 1, torch.tensor(padded_h - 1).to(device), y1)

im_padded = im_padded.view(n, c, -1)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from .custom_atss_detector import CustomATSS
from .custom_deformable_detr_detector import CustomDeformableDETR
from .custom_dino_detector import CustomDINO
from .custom_lite_dino import CustomLiteDINO
from .custom_maskrcnn_detector import CustomMaskRCNN
from .custom_maskrcnn_tile_optimized import CustomMaskRCNNTileOptimized
from .custom_single_stage_detector import CustomSingleStageDetector
Expand All @@ -19,6 +20,7 @@
__all__ = [
"CustomATSS",
"CustomDeformableDETR",
"CustomLiteDINO",
"CustomDINO",
"CustomMaskRCNN",
"CustomSingleStageDetector",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""OTX Lite-DINO Class for object detection."""

# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#

from mmdet.models.builder import DETECTORS

from otx.algorithms.common.utils.logger import get_logger
from otx.algorithms.detection.adapters.mmdet.models.detectors import CustomDINO

logger = get_logger()


@DETECTORS.register_module()
class CustomLiteDINO(CustomDINO):
"""Custom Lite-DINO <https://arxiv.org/pdf/2303.07335.pdf> for object detection."""

def load_state_dict_pre_hook(self, model_classes, ckpt_classes, ckpt_dict, *args, **kwargs):
"""Modify official lite dino version's weights before weight loading."""
super(CustomDINO, self).load_state_dict_pre_hook(model_classes, ckpt_classes, ckpt_dict, *args, *kwargs)
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,13 @@

from .dino import CustomDINOTransformer
from .dino_layers import CdnQueryGenerator, DINOTransformerDecoder
from .lite_detr_layers import EfficientTransformerEncoder, EfficientTransformerLayer, SmallExpandFFN

__all__ = ["CustomDINOTransformer", "DINOTransformerDecoder", "CdnQueryGenerator"]
__all__ = [
"CustomDINOTransformer",
"DINOTransformerDecoder",
"CdnQueryGenerator",
"EfficientTransformerEncoder",
"EfficientTransformerLayer",
"SmallExpandFFN",
]
Loading