Skip to content

Conversation

@valentijnscholten
Copy link
Member

@valentijnscholten valentijnscholten commented Oct 16, 2025

Because the tags field is a M2M relationship behind the scenes, this can result in duplicate rows in the result due to the nature of SQL JOINs.

This PR resolves this by applying DISTINCT when filtering on tag names.

                            # distinct has a performance impact, so only apply it if needed.
                            # we considered Postgress' DISTINCT ON, but it would enforce ordering by id
                            # we considered changing to an EXISTS subquery, but it would make
                            # our code dependant on the some of the django-tagulous internal.

Fixes #13429

Copy link
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@Maffooch
Copy link
Contributor

@fopina can you please take a look at this one? It may pique your interest 😄

Copy link
Contributor

@fopina fopina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting topic, thanks for tagging!

dojo/filters.py Outdated
)


class TagExistsIContainsFilter(CharFilter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this filter picked up automatically by something or was it some experiment and forgotten?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the leftover.

dojo/filters.py Outdated
for name, f in self.filters.items():
field_name = getattr(f, "field_name", "") or ""
# filtering on tag names would result duplicate rows, one for each matching tag
if "tags__name" in field_name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both tags and tag filters use tags__name field but only tag does a contains.
The exact match of tags should never yield duplicates as tags__name is unique, right?

Maybe a more generic would be if filter lookup_expr is icontains and exclude is False and field is an m2m then use distinct?
Not sure how to check for the latter (field is an m2m), need to inspect filter object to see if it is already bound to the actual model field (if so, it would be clean otherwise a bit hacky)
Of course restricting the distinct to "icontains" would only make sense for joins on unique fields. If it's a join on a non-unique field, it could yield duplicate results even with iexact matches...

Leaving some food for thought but maybe the simple condition (to cover all) would be if not exclude and field is m2m. It would apply to more cases then needed, compared to previous suggestion (eg: exact matches on unique relations) but the same as current condition, extended to any field

Not only covers other existing m2m fields (if there's any, didn't check), but also prepares for future fields that might drop into the models/filters and looks "better" (in sense it doesn't have "tags__name" hardcoded)

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've narrowed the condition which triggers the distinct.

dojo/filters.py Outdated
if value not in (None, "", [], (), {}):
# distinct has a performance impact, so only apply it if needed.
# we considered Postgress' DISTINCT ON, but it would enforce ordering by id
# we considered changing to an EXISTS subquery, but it would make
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame for DISTINCT ON 😄 it's the price to pay for the performance boost!

The EXISTS option however sounds appealing! What internals would it be pinned to? Just the .tags.through, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some stuff around mapping the fields through the m2m model. nothing crazy but I feel the current approach is easier to understand for everyone and has less chances or corner cases where some magical combination of filters/ordering might break if we rewrite the tag filter to an EXISTs query.

dojo/filters.py Outdated
# we considered changing to an EXISTS subquery, but it would make
# our code dependant on the some of the django-tagulous internal
return qs.distinct()
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exceptions are expected here? Shouldn't we avoid these broad excepts? Especially as it's also returning a .distinct() (in case that would be the source for unexpected exceptions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the leftover

@valentijnscholten
Copy link
Member Author

@fopina Thanks for reviewing. To be honest initially I just planned on slapping on the distinct() always. But decided at least make it a little smarter. But I guess I should have upped my game before showing it to the Django guru's out there :-D . The bug has been around for 5 years before we got the first report of it, so I think the PR is now fine to deal with it.

@fopina
Copy link
Contributor

fopina commented Oct 17, 2025

Far from guru sorry if I made myself try to sound like one..! I’d likely just slap the distinct() for everything and only notice when production list of findings continuously timed out

I guess tag contains is something rare to use as the whole point is to pinpoint them (even a list), makes sense to go unnoticed

thanks!

@valentijnscholten
Copy link
Member Author

I was just joking. I think it's good to have people around to keep me sharp.

@valentijnscholten valentijnscholten merged commit 6661035 into DefectDojo:bugfix Oct 17, 2025
149 checks passed
@Maffooch
Copy link
Contributor

@fopina FWIW I am/was impressed with your insight 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants