Skip to content

Commit 12f1107

Browse files
deduplication logic: add docs
1 parent 2ec11d9 commit 12f1107

File tree

3 files changed

+30
-3
lines changed

3 files changed

+30
-3
lines changed
62.8 KB
Loading
54.5 KB
Loading

docs/content/en/working_with_findings/finding_deduplication/deduplication_tuning_os.md

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,15 +103,42 @@ Notes:
103103

104104
## After changing deduplication settings
105105

106-
- Changes to dedupe configuration (e.g., `HASHCODE_FIELDS_PER_SCANNER`, `HASH_CODE_FIELDS_ALWAYS`, `DEDUPLICATION_ALGORITHM_PER_PARSER`) trigger background processing via Celery.
107-
- Hashes for findings of the affected test types are recalculated asynchronously; deduplication relationships can update over time.
108-
- Allow some time after changes or imports before evaluating results, as updates are not instantaneous.
106+
- Changes to dedupe configuration (e.g., `HASHCODE_FIELDS_PER_SCANNER`, `HASH_CODE_FIELDS_ALWAYS`, `DEDUPLICATION_ALGORITHM_PER_PARSER`) are not applied retroactively automatically. To re-evaluate existing findings you must run the management command below.
107+
108+
Run inside the uwsgi container. Example (hash codes only, no dedupe):
109+
110+
```bash
111+
docker compose exec uwsgi /bin/bash -c "python manage.py dedupe --hash_code_only"
112+
```
113+
114+
Help/usage:
115+
116+
options:
117+
--parser PARSER List of parsers for which hash_code needs recomputing
118+
(defaults to all parsers)
119+
--hash_code_only Only compute hash codes
120+
--dedupe_only Only run deduplication
121+
--dedupe_sync Run dedupe in the foreground, default false
122+
```
123+
124+
If you submit dedupe to Celery (without `--dedupe_sync`), allow time for tasks to complete before evaluating results.
109125
110126
## Where to configure
111127
112128
- Prefer environment variables in deployments. For local development or advanced overrides, use `local_settings.py`.
113129
- See `configuration.md` for details on how to set environment variables and configure local overrides.
114130
131+
### Troubleshooting
132+
133+
To help troubleshooting deduplication use the following tools:
134+
135+
- Observe log out in the `dojo.specific-loggers.deduplication` category. This is a class independant logger that outputs details about the deduplication process and settings when processing findings.
136+
- Observe the `unique_id_from_tool` and `hash_code` values by hovering over the `ID` field or `Status` column:
137+
138+
![Unique ID from Tool and Hash Code on the View Finding page](images/hash_code_id_field.png)
139+
140+
![Unique ID from Tool and Hash Code on the Finding List Status Column](images/hash_code_status_column.png)
141+
115142
## Related documentation
116143
117144
- [Deduplication Algorithms](deduplication_algorithms): conceptual overview and endpoint behavior.

0 commit comments

Comments
 (0)