Skip to content

Conversation

@jklubienski
Copy link
Contributor

@jklubienski jklubienski commented Sep 10, 2025

Adds TIGER tumour detection task to eva using the WSIBULK dataset from the TIGER grand challenge (tiger_tumour.py), along with a suggested config to run it (tiger_tumour.yaml)

Docs for all the TIGER tasks found in tiger.md

Testing to come precipitantly.

@jklubienski jklubienski force-pushed the feature/tiger-tumour branch 7 times, most recently from 66f3b34 to b753856 Compare September 10, 2025 13:22
Copy link
Collaborator

@nkaenzig nkaenzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice, thanks for this contribution. 🚀 I left a couple of comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related comment from other PR: #885 (comment)

@jklubienski jklubienski changed the title Feature/tiger tumour Implement Feature/tiger-wsibulk Sep 24, 2025
Copy link
Collaborator

@nkaenzig nkaenzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 🚀 Left a few last small comments. Last thing missing would be the unit tests as discussed offline.

if not all_paths:
raise FileNotFoundError(f"No .tif files found in {image_dir}")

rng = random.Random(self._seed) # nosec B311
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have a splitting module which provide functions to do random splits:

train_indices, val_indices, test_indices = splitting.random_split(

| │ ├── 104S.tiff
│ | └── ...
| |__tissue-masks/ * Not used in eva
| |__tiger-tils-scores-wsitils.csv * Target variable file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it tiger-til-scores-wsitils.csv or tiger-tils-scores-wsitils.csv?





Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove those blank lines

def prepare_data(self) -> None:
_validators.check_dataset_exists(self._root, False)

# @override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we uncomment this? (I assume this was commented used for testing)

return coord_dict

@classmethod
def from_dict(cls, row: Dict[str, Any]) -> "PatchCoordinates":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This classmethod might not required, I think we probably can also use the main constructor like this:

Either like this PatchCoordinates(**row.as_dict()), or with explicit assignments: PatchCoordinates(x_y=row["x_y"], ...)

def _load_file_paths(self, split: Literal["train", "val", "test"] | None = None) -> List[str]:
"""Loads the file paths of WSIs from wsibulk/images.
Splits are assigned 70% train, 15% val, 15% test by filename sorting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by filename sorting can be removed as it's not the case anymore

df = pd.read_csv(csv_path)
n_rows = len(df)

print(f"Annotating split '{split}' with {n_rows} images...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the print or use logger.info instead.

"WsiClassificationDataset",
"PANDA",
"PANDASmall",
"Camelyon16",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR but just saw that Camelyon16 appears twice

_val_split_ratio: float = 0.15

# target microns per pixel (mpp) for patches.
_target_mpp: float = 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as here #885 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz add this config to tests/eva/vision/test_vision_cli.py (at least to test_configuration_initialization, ideally also to test_predict_fit_from_configuration), so we can test for instantiation errors

Copy link
Collaborator

@nkaenzig nkaenzig Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some feedback I got from claude / codex here:

• src/eva/vision/data/datasets/tiger.py:108: _load_file_paths only glob matches *.tif. The TIGER docs you just added (docs/datasets/tiger.md, see the sample tree) show the WSIs using the .tiff extension.

• src/eva/vision/data/datasets/classification/tiger_wsibulk.py:86: the dataset now emits progress via print every time annotations are built. In multi-worker loaders this will spam STDOUT and can even deadlock under some multiprocessing backends; please route this through the project logger (or make it optional).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants