tag based filtering: avoid duplicate rows in results #13442

valentijnscholten · 2025-10-16T16:29:37Z

Because the tags field is a M2M relationship behind the scenes, this can result in duplicate rows in the result due to the nature of SQL JOINs.

This PR resolves this by applying DISTINCT when filtering on tag names.

                            # distinct has a performance impact, so only apply it if needed.
                            # we considered Postgress' DISTINCT ON, but it would enforce ordering by id
                            # we considered changing to an EXISTS subquery, but it would make
                            # our code dependant on the some of the django-tagulous internal.

Fixes #13429

mtesauro

Approved

Maffooch · 2025-10-17T02:42:02Z

@fopina can you please take a look at this one? It may pique your interest 😄

fopina

Very interesting topic, thanks for tagging!

fopina · 2025-10-17T08:46:46Z

dojo/filters.py

        )


+class TagExistsIContainsFilter(CharFilter):


Is this filter picked up automatically by something or was it some experiment and forgotten?

removed the leftover.

fopina · 2025-10-17T08:58:57Z

dojo/filters.py

+                for name, f in self.filters.items():
+                    field_name = getattr(f, "field_name", "") or ""
+                    # filtering on tag names would result duplicate rows, one for each matching tag
+                    if "tags__name" in field_name:


Both tags and tag filters use tags__name field but only tag does a contains.
The exact match of tags should never yield duplicates as tags__name is unique, right?

Maybe a more generic would be if filter lookup_expr is icontains and exclude is False and field is an m2m then use distinct?
Not sure how to check for the latter (field is an m2m), need to inspect filter object to see if it is already bound to the actual model field (if so, it would be clean otherwise a bit hacky)
Of course restricting the distinct to "icontains" would only make sense for joins on unique fields. If it's a join on a non-unique field, it could yield duplicate results even with iexact matches...

Leaving some food for thought but maybe the simple condition (to cover all) would be if not exclude and field is m2m. It would apply to more cases then needed, compared to previous suggestion (eg: exact matches on unique relations) but the same as current condition, extended to any field

Not only covers other existing m2m fields (if there's any, didn't check), but also prepares for future fields that might drop into the models/filters and looks "better" (in sense it doesn't have "tags__name" hardcoded)

WDYT?

I've narrowed the condition which triggers the distinct.

fopina · 2025-10-17T10:09:32Z

dojo/filters.py

+                        if value not in (None, "", [], (), {}):
+                            # distinct has a performance impact, so only apply it if needed.
+                            # we considered Postgress' DISTINCT ON, but it would enforce ordering by id
+                            # we considered changing to an EXISTS subquery, but it would make


It's a shame for DISTINCT ON 😄 it's the price to pay for the performance boost!

The EXISTS option however sounds appealing! What internals would it be pinned to? Just the .tags.through, right?

just some stuff around mapping the fields through the m2m model. nothing crazy but I feel the current approach is easier to understand for everyone and has less chances or corner cases where some magical combination of filters/ordering might break if we rewrite the tag filter to an EXISTs query.

fopina · 2025-10-17T10:11:48Z

dojo/filters.py

+                            # we considered changing to an EXISTS subquery, but it would make
+                            # our code dependant on the some of the django-tagulous internal
+                            return qs.distinct()
+        except Exception:


What exceptions are expected here? Shouldn't we avoid these broad excepts? Especially as it's also returning a .distinct() (in case that would be the source for unexpected exceptions)

removed the leftover

valentijnscholten · 2025-10-17T15:26:08Z

@fopina Thanks for reviewing. To be honest initially I just planned on slapping on the distinct() always. But decided at least make it a little smarter. But I guess I should have upped my game before showing it to the Django guru's out there :-D . The bug has been around for 5 years before we got the first report of it, so I think the PR is now fine to deal with it.

fopina · 2025-10-17T16:35:00Z

Far from guru sorry if I made myself try to sound like one..! I’d likely just slap the distinct() for everything and only notice when production list of findings continuously timed out

I guess tag contains is something rare to use as the whole point is to pinpoint them (even a list), makes sense to go unnoticed

thanks!

valentijnscholten · 2025-10-17T16:36:27Z

I was just joking. I think it's good to have people around to keep me sharp.

Maffooch · 2025-10-17T16:39:05Z

@fopina FWIW I am/was impressed with your insight 😄

valentijnscholten added this to the 2.51.2 milestone Oct 16, 2025

valentijnscholten requested review from Maffooch and mtesauro as code owners October 16, 2025 16:29

mtesauro approved these changes Oct 16, 2025

View reviewed changes

mtesauro requested review from blakeaowens and dogboat October 17, 2025 00:48

Maffooch approved these changes Oct 17, 2025

View reviewed changes

fopina reviewed Oct 17, 2025

View reviewed changes

blakeaowens approved these changes Oct 17, 2025

View reviewed changes

valentijnscholten added 3 commits October 17, 2025 17:26

tag based filtering: avoid duplicate rows in results

cda5cc6

tag based filtering: avoid duplicate rows in results

028bfa1

improvements

ed3f9d8

valentijnscholten force-pushed the tag-contains-filter-fix branch from 13f587a to ed3f9d8 Compare October 17, 2025 15:27

dogboat approved these changes Oct 17, 2025

View reviewed changes

valentijnscholten merged commit 6661035 into DefectDojo:bugfix Oct 17, 2025
149 checks passed

		)


		class TagExistsIContainsFilter(CharFilter):

tag based filtering: avoid duplicate rows in results #13442

tag based filtering: avoid duplicate rows in results #13442

Uh oh!

Conversation

valentijnscholten commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mtesauro left a comment

Choose a reason for hiding this comment

Uh oh!

Maffooch commented Oct 17, 2025

Uh oh!

fopina left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valentijnscholten commented Oct 17, 2025

Uh oh!

fopina commented Oct 17, 2025

Uh oh!

valentijnscholten commented Oct 17, 2025

Uh oh!

Uh oh!

Maffooch commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

valentijnscholten commented Oct 16, 2025 •

edited

Loading