-
Notifications
You must be signed in to change notification settings - Fork 33
Implement Feature/tiger-wsibulk #884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
66f3b34 to
b753856
Compare
b753856 to
41e96c7
Compare
nkaenzig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice, thanks for this contribution. 🚀 I left a couple of comments.
configs/vision/pathology/offline/classification/tiger_tumour.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related comment from other PR: #885 (comment)
nkaenzig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work 🚀 Left a few last small comments. Last thing missing would be the unit tests as discussed offline.
| if not all_paths: | ||
| raise FileNotFoundError(f"No .tif files found in {image_dir}") | ||
|
|
||
| rng = random.Random(self._seed) # nosec B311 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we have a splitting module which provide functions to do random splits:
| train_indices, val_indices, test_indices = splitting.random_split( |
docs/datasets/tiger.md
Outdated
| | │ ├── 104S.tiff | ||
| │ | └── ... | ||
| | |__tissue-masks/ * Not used in eva | ||
| | |__tiger-tils-scores-wsitils.csv * Target variable file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it tiger-til-scores-wsitils.csv or tiger-tils-scores-wsitils.csv?
docs/datasets/tiger.md
Outdated
|
|
||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can remove those blank lines
| def prepare_data(self) -> None: | ||
| _validators.check_dataset_exists(self._root, False) | ||
|
|
||
| # @override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we uncomment this? (I assume this was commented used for testing)
| return coord_dict | ||
|
|
||
| @classmethod | ||
| def from_dict(cls, row: Dict[str, Any]) -> "PatchCoordinates": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This classmethod might not required, I think we probably can also use the main constructor like this:
Either like this PatchCoordinates(**row.as_dict()), or with explicit assignments: PatchCoordinates(x_y=row["x_y"], ...)
| def _load_file_paths(self, split: Literal["train", "val", "test"] | None = None) -> List[str]: | ||
| """Loads the file paths of WSIs from wsibulk/images. | ||
| Splits are assigned 70% train, 15% val, 15% test by filename sorting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by filename sorting can be removed as it's not the case anymore
| df = pd.read_csv(csv_path) | ||
| n_rows = len(df) | ||
|
|
||
| print(f"Annotating split '{split}' with {n_rows} images...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the print or use logger.info instead.
| "WsiClassificationDataset", | ||
| "PANDA", | ||
| "PANDASmall", | ||
| "Camelyon16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR but just saw that Camelyon16 appears twice
| _val_split_ratio: float = 0.15 | ||
|
|
||
| # target microns per pixel (mpp) for patches. | ||
| _target_mpp: float = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as here #885 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add this config to tests/eva/vision/test_vision_cli.py (at least to test_configuration_initialization, ideally also to test_predict_fit_from_configuration), so we can test for instantiation errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some feedback I got from claude / codex here:
• src/eva/vision/data/datasets/tiger.py:108: _load_file_paths only glob matches *.tif. The TIGER docs you just added (docs/datasets/tiger.md, see the sample tree) show the WSIs using the .tiff extension.
• src/eva/vision/data/datasets/classification/tiger_wsibulk.py:86: the dataset now emits progress via print every time annotations are built. In multi-worker loaders this will spam STDOUT and can even deadlock under some multiprocessing backends; please route this through the project logger (or make it optional).
Adds TIGER tumour detection task to eva using the WSIBULK dataset from the TIGER grand challenge (
tiger_tumour.py), along with a suggested config to run it (tiger_tumour.yaml)Docs for all the TIGER tasks found in
tiger.mdTesting to come precipitantly.