Skip to content

Add oriented_box_iou_batch function to detection.utils#1502

Merged
LinasKo merged 5 commits intoroboflow:developfrom
patel-zeel:feat/oriented_box_iou_batch
Sep 24, 2024
Merged

Add oriented_box_iou_batch function to detection.utils#1502
LinasKo merged 5 commits intoroboflow:developfrom
patel-zeel:feat/oriented_box_iou_batch

Conversation

@patel-zeel
Copy link
Contributor

@patel-zeel patel-zeel commented Sep 5, 2024

Description

Implemented oriented_box_iou_batch function in detection.utils as discussed with @LinasKo in #1295.

No additional dependencies are required for this change.

Context

  • This function computes IoU (intersection over union) for oriented/rotated bounding boxes (aka OBB).

Important details for the reviewers:

  1. The implementation is generic and can work for axis-aligned, oriented, and irregular shape polygons. There is no check in the function to verify if the passed box is oriented rectangular bounding box.
  2. polygon_to_mask function requires resolution_wh argument but I have bypassed it by considering a pseudo-image that covers the true and detected boxes. I am assuming that we will always pass box values in the true scale of the image size and not scaled to [0, 1].
max_height = max(boxes_true[:, :, 0].max(), boxes_detection[:, :, 0].max()) + 1
# adding 1 because we are 0-indexed
max_width = max(boxes_true[:, :, 1].max(), boxes_detection[:, :, 1].max()) + 1

Questions for the reviewers:

  1. Once this function is reviewed and merged, would/should it be used as a generic function and replace the box_iou_batch function?
  2. Do we have corner cases where we must use np.nan_to_num(ious) before returning ious?

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

Thus far, I have not added any tests. I have tested the function with the following code snippet. Showing the examples and plots first followed by the code snippet:

Examples and plots

box_detection = np.array([[1, 1], [2, 0], [4, 2], [3, 3]]) # overlap
box_true = np.array([[1, 0], [0, 1], [3, 4], [4, 3]])

image

box_detection = np.array([[4, 1], [5, 0], [5, 2], [6, 1]]) # no overlap
box_true = np.array([[1, 0], [0, 1], [3, 4], [4, 3]])

image

box_detection = np.array([[1, 0], [0, 1], [3, 4], [4, 3]]) # perfect match

image

Code snippet

import numpy as np
import matplotlib.pyplot as plt
from supervision.detection.utils import polygon_to_mask, oriented_box_iou_batch

# config
box_detection = np.array([[1, 1], [2, 0], [4, 2], [3, 3]]) # overlap
# box_detection = np.array([[4, 1], [5, 0], [5, 2], [6, 1]]) # no overlap
# box_detection = np.array([[1, 0], [0, 1], [3, 4], [4, 3]]) # perfect match

box_true = np.array([[1, 0], [0, 1], [3, 4], [4, 3]])

height = 5
width = 8

# plotting function
def plot_img(img, alpha, ax):
    mappable = ax.imshow(img, interpolation="none", alpha=alpha, vmin=0, vmax=255, cmap="coolwarm")
    ax.set_xticks(np.arange(0.5, width, 1))
    ax.set_xticklabels(np.arange(0, width, 1))
    ax.set_yticks(np.arange(0.5, height, 1))
    ax.set_yticklabels(np.arange(0, height, 1))
    ax.grid(True, which='both', linestyle='--', linewidth=1, color='k')
    return mappable

# create image
img = np.zeros((height, width)).astype(int) + 128

# manual process
mask_true = polygon_to_mask(box_true, (img.shape[1], img.shape[0])).astype(int)
mask_detection = polygon_to_mask(box_detection, (img.shape[1], img.shape[0])).astype(int)
union_pixels = np.sum((mask_true + mask_detection).clip(0, 1))
intersection_pixels = np.sum((mask_true * mask_detection))
iou = intersection_pixels / union_pixels

# Using the function
func_iou = oriented_box_iou_batch(box_true, box_detection)
assert np.isclose(iou, func_iou), f"{iou=}, {func_iou=}"

# plot
fig, axs = plt.subplots(1, 3, figsize=(12, 3.5))

img_true = (img + (mask_true * 255)).clip(0, 255)
plot_img(img_true, 1, axs[0])
axs[0].set_title("True")

img_detection = (img - (mask_detection * 255)).clip(0, 255)
plot_img(img_detection, 1, axs[1])
axs[1].set_title("Detection")

plot_img(img_true, 0.8, axs[2])
plot_img(img_detection, 0.5, axs[2])
axs[2].set_title("Overlay")

manual = f"Manual: Intersection={intersection_pixels}, Union={union_pixels}, IoU={intersection_pixels}/{union_pixels}={intersection_pixels/union_pixels}"
automatic = f"{oriented_box_iou_batch(box_true, box_detection)=}"
fig.suptitle(f"{manual}\n{automatic}", ha='center')

Any specific deployment considerations (this section is not yet edited)

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs (this section is not yet edited)

  • Docs updated? What were the changes:

@CLAassistant
Copy link

CLAassistant commented Sep 5, 2024

CLA assistant check
All committers have signed the CLA.

@LinasKo
Copy link
Contributor

LinasKo commented Sep 5, 2024

Excellent work, @patel-zeel! I especially like the visuals.

To answer your questions:

  1. No, the usage of box_iou_batch and mask_iou_batch in the codebase should remain unchanged. Especially the former, as it provides a faster and less complex implementation than the one required by OBB. Keeping oriented_box_iou_batch separate also allows its implementation to vary independently of others.
  2. I wonder how small an OBB can be. Suppose you had an xyxyxyxy where all x values are equal, and all y values are equal too. I don't know if a model can produce that in reality, let's try to support it. Could you check what masks are created and what IoU is produced?

@patel-zeel
Copy link
Contributor Author

Excellent work, @patel-zeel! I especially like the visuals.

Thank you, @LinasKo. While making the plots, I realized that gridlines pass from the middle of a pixel, making it hard to visually count union, intersection, and IoU. Thus, I shifted the gridlines by half of the pixel size to make sure pixels overlap with the cells of the grid and not on the intersections of gridlines. It was fun working on this!

Another question:

  • Do we need to / Can we vectorize this further? As per my trials, cv2.fillPoly was not vectorizable.

Now, coming to other points raised by you:

No, the usage of box_iou_batch and mask_iou_batch in the codebase should remain unchanged. Especially the former, as it provides a faster and less complex implementation than the one required by OBB. Keeping oriented_box_iou_batch separate also allows its implementation to vary independently of others.

Yes, that makes sense! Thank you.

I wonder how small an OBB can be. Suppose you had an xyxyxyxy where all x values are equal, and all y values are equal too. I don't know if a model can produce that in reality, let's try to support it. Could you check what masks are created and what IoU is produced?

Given all the same values, cv2.fillPoly creates a filter of size 1 pixel, which should be error-free. I am showing two more examples here:

box_detection = np.array([[0, 0], [0, 0], [0, 0], [0, 0]])
box_true = np.array([[0, 0], [0, 0], [0, 0], [0, 0]])

image

box_detection = np.array([[1, 1], [1, 1], [1, 1], [1, 1]])
box_true = np.array([[0, 0], [0, 0], [0, 0], [0, 0]])

image

@patel-zeel
Copy link
Contributor Author

@LinasKo, this is a gentle ping to let you know that I can make the desired changes before you again review this PR. Feel free to let me know whenever you take a look.

@LinasKo
Copy link
Contributor

LinasKo commented Sep 19, 2024

Hi @patel-zeel, I have it in my sights - I'll 100% include it before the new release, and very likely in the next few days. Apologies for the long wait!

@patel-zeel
Copy link
Contributor Author

Thank you for the quick response, @LinasKo. No worries at all.

@LinasKo
Copy link
Contributor

LinasKo commented Sep 19, 2024

Hi @patel-zeel, here's my feedback:

  1. Please pull before making any edits - I pushed a small change, adding the new function to the global scope and the docs.
  2. xyxyxyxy are expected to be of type np.float32, but an error is thrown when that is the case.
  3. I don't think we should be adding +1 to the width / height. The result it too large. An xyxy box [0, 0, 0, 0] should have area of 0, and so should a mask that is entirely empty. We'd like OBB to match that behaviour.

Here's the Colab I worked with.

@patel-zeel
Copy link
Contributor Author

patel-zeel commented Sep 24, 2024

Hi @LinasKo, thank you for the feedback.

An xyxy box [0, 0, 0, 0] should have area of 0, and so should a mask that is entirely empty.

Yes, that seems logically correct, but the current implementation of polygon_to_mask does not generate an empty mask for a zero-area polygon. Considering xyxy a simpler case than xyxyxyxy, we have to remove the last/first row and column of the mask to match it with the area formula. I think xyxyxyxy mask will be even more complex to handle. Any ideas on how to tackle this?

To add more context, the area of the mask depends on whether we include the pixels on which the box boundary lies. However, none of those two cases will match the area formula. For example, box [0, 0, 2, 2] has area 2 x 2 = 4. Including the pixels where the boundary lies, the area is 3 x 3 = 9. The area is 1 x 1 = 1, excluding the pixels where the boundary lies.

This brings a question: Shouldn't this match how object detection models handle bounding boxes, as in, how do they process pixels on which the box boundary lies?

from supervision import polygon_to_mask

def xyxy_to_xyxyxyxy(xyxy) -> np.ndarray:
    results = []
    for box in xyxy:
        obb_single = np.array([
            (box[0], box[1]),
            (box[2], box[1]),
            (box[2], box[3]),
            (box[0], box[3]),
        ])
        results.append(obb_single)
    return np.array(results)

xyxy = np.array([[0, 0, 0, 0]])
xyxyxyxy = xyxy_to_xyxyxyxy(xyxy)
mask = polygon_to_mask(xyxyxyxy, (3, 3))
print(mask)

Output

[[1. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

@LinasKo
Copy link
Contributor

LinasKo commented Sep 24, 2024

Hi @patel-zeel, thank you for your thoughts.

Indeed, I can see the issue. The problem preceding this discussion is that xyxy and xyxyxyxy are in subpixel precision (as returned by the models), and masks, as we represent them today, are integer-shaped.

For example, box [0, 0, 2, 2] has an area of 2 x 2 = 4. Including the pixels where the boundary lies, the area is 3 x 3 = 9. The area is 1 x 1 = 1, excluding the pixels where the boundary lies.

Quick note: here we should not consider the standard where all boundary pixels are excluded. The viable options are:

  • Include start, exclude end - match how programming languages treat array indexing (area = 2 x 2 = 4)
  • Include start, include end - match how masks are treated, necessitating a special case when 0-sized boxes are needed. (area = 3 x 3 = 9)

Putting that aside, let's sacrifice correctness for now and focus on enabling more use cases. Areas are primarily used to compare detections of the same type, after all.

I'll take over, carry out the minor doc fixes, run a few tests, and merge this in. I've also noted down that polygon_to_mask needs looking into.

@patel-zeel
Copy link
Contributor Author

Sure, @LinasKo. It was interesting to work on this issue. I'm looking forward to seeing it in the next release.

Copy link
Contributor

@LinasKo LinasKo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the code, verified the results are what we expected and built the docs. Looks good; merging!

Thank you @patel-zeel! 🤝
It is a true delight, seeing a thoughtfully made and tested PR such as yours. Especially when it reveals further flaws we can investigate to make our library even better. It has been a pleasure working with you.

@LinasKo LinasKo merged commit 93190b2 into roboflow:develop Sep 24, 2024
@MoAbbasid
Copy link

hi @patel-zeel, kinda unrelated question
how do you generate these visuals?
image

@patel-zeel
Copy link
Contributor Author

patel-zeel commented Oct 1, 2024

Hi @MoAbbasid, you can generate these plots with the Code snippet given in the PR Description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants