-
Notifications
You must be signed in to change notification settings - Fork 845
Description
Describe the bug
_process_names in class DicomImageRedactorEngine unconditionally adds all DICOM metadata strings as PHI
To Reproduce
Code inspection: first line is phi_list = text_metadata.copy() and then it is just extended when is_name is
True:
def _process_names(cls, text_metadata: list, is_name: list) -> list:
"""Process names to have multiple iterations in our PHI list.
:param text_metadata: List of all the instance's element values
(excluding pixel data).
:param is_name: True if the element is specified as being a name.
:return: Metadata text with additional name iterations appended.
"""
phi_list = text_metadata.copy()
for i in range(0, len(text_metadata)):
if is_name[i] is True:
original_text = str(text_metadata[i])
phi_list += cls.augment_word(original_text)
return phi_list
A second issue is that it is converting MultiValue values into strings directly so you end up with an array of strings looking like: ["['1000', '234566']", 'Joe'] instead of the function returning an array of strings: ['1000', '234566', 'Joe'].
Expected behavior
_process_names should only add words where is_name is True and should handle MultiValue metadata fields correctly.
Additional context
Too many words are being removed when all we should be removing is PHI and multi-value meta strings are being added incorrectly.