Skip to content

Commit b753856

Browse files
JklubienskiJklubienski
authored andcommitted
Implement TIGER Tumour classification task
1 parent 37d4950 commit b753856

File tree

6 files changed

+411
-1
lines changed

6 files changed

+411
-1
lines changed
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
trainer:
3+
class_path: eva.Trainer
4+
init_args:
5+
n_runs: &N_RUNS ${oc.env:N_RUNS, 20}
6+
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:MODEL_NAME, dino_vits16}/offline/tiger_tumour}
7+
max_epochs: &MAX_EPOCHS ${oc.env:MAX_EPOCHS, 100}
8+
checkpoint_type: ${oc.env:CHECKPOINT_TYPE, best}
9+
callbacks:
10+
- class_path: eva.callbacks.ConfigurationLogger
11+
- class_path: lightning.pytorch.callbacks.TQDMProgressBar
12+
init_args:
13+
refresh_rate: ${oc.env:TQDM_REFRESH_RATE, 1}
14+
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
15+
init_args:
16+
logging_interval: epoch
17+
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
18+
init_args:
19+
filename: best
20+
save_last: ${oc.env:SAVE_LAST, false}
21+
save_top_k: 1
22+
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/BinaryBalancedAccuracy}
23+
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max}
24+
- class_path: lightning.pytorch.callbacks.EarlyStopping
25+
init_args:
26+
min_delta: 0
27+
patience: ${oc.env:PATIENCE, 20}
28+
monitor: *MONITOR_METRIC
29+
mode: *MONITOR_METRIC_MODE
30+
- class_path: eva.callbacks.ClassificationEmbeddingsWriter
31+
init_args:
32+
output_dir: &DATASET_EMBEDDINGS_ROOT ${oc.env:EMBEDDINGS_ROOT, ./data/embeddings/${oc.env:MODEL_NAME, dino_vits16}/tiger_tumour}
33+
dataloader_idx_map:
34+
0: train
35+
1: val
36+
2: test
37+
metadata_keys: ["wsi_id"]
38+
backbone:
39+
class_path: eva.vision.models.ModelFromRegistry
40+
init_args:
41+
model_name: ${oc.env:MODEL_NAME, universal/vit_small_patch16_224_dino}
42+
model_extra_kwargs: ${oc.env:MODEL_EXTRA_KWARGS, null}
43+
overwrite: false
44+
logger:
45+
- class_path: lightning.pytorch.loggers.TensorBoardLogger
46+
init_args:
47+
save_dir: *OUTPUT_ROOT
48+
name: ""
49+
model:
50+
class_path: eva.HeadModule
51+
init_args:
52+
head:
53+
class_path: eva.vision.models.networks.ABMIL
54+
init_args:
55+
input_size: ${oc.env:IN_FEATURES, 384}
56+
output_size: &NUM_CLASSES 1
57+
projected_input_size: 128
58+
criterion: torch.nn.BCEWithLogitsLoss
59+
optimizer:
60+
class_path: torch.optim.AdamW
61+
init_args:
62+
lr: ${oc.env:LR_VALUE, 0.001}
63+
betas: [0.9, 0.999]
64+
metrics:
65+
common:
66+
- class_path: eva.metrics.AverageLoss
67+
- class_path: eva.metrics.BinaryClassificationMetrics
68+
data:
69+
class_path: eva.DataModule
70+
init_args:
71+
datasets:
72+
train:
73+
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
74+
init_args: &DATASET_ARGS
75+
root: *DATASET_EMBEDDINGS_ROOT
76+
manifest_file: manifest.csv
77+
split: train
78+
embeddings_transforms:
79+
class_path: eva.core.data.transforms.Pad2DTensor
80+
init_args:
81+
pad_size: &N_PATCHES ${oc.env:N_PATCHES, 200}
82+
target_transforms:
83+
class_path: eva.core.data.transforms.dtype.ArrayToFloatTensor
84+
val:
85+
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
86+
init_args:
87+
<<: *DATASET_ARGS
88+
split: val
89+
test:
90+
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
91+
init_args:
92+
<<: *DATASET_ARGS
93+
split: test
94+
predict:
95+
- class_path: eva.vision.datasets.TIGERTumour
96+
init_args: &PREDICT_DATASET_ARGS
97+
root: ${oc.env:DATA_ROOT, ./data/training/wsibulk}
98+
sampler:
99+
class_path: eva.vision.data.wsi.patching.samplers.ForegroundGridSampler
100+
init_args:
101+
max_samples: *N_PATCHES
102+
width: 224
103+
height: 224
104+
target_mpp: 0.5
105+
split: train
106+
coords_path: ${data.init_args.datasets.train.init_args.root}/coords_${.split}.csv
107+
image_transforms:
108+
class_path: eva.vision.data.transforms.common.ResizeAndCrop
109+
init_args:
110+
size: ${oc.env:RESIZE_DIM, 224}
111+
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]}
112+
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]}
113+
- class_path: eva.vision.datasets.TIGERTumour
114+
init_args:
115+
<<: *PREDICT_DATASET_ARGS
116+
split: val
117+
- class_path: eva.vision.datasets.TIGERTumour
118+
init_args:
119+
<<: *PREDICT_DATASET_ARGS
120+
split: test
121+
dataloaders:
122+
train:
123+
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 32}
124+
num_workers: &N_DATA_WORKERS ${oc.env:N_DATA_WORKERS, 4}
125+
shuffle: true
126+
val:
127+
batch_size: *BATCH_SIZE
128+
num_workers: *N_DATA_WORKERS
129+
test:
130+
batch_size: *BATCH_SIZE
131+
num_workers: *N_DATA_WORKERS
132+
predict:
133+
batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64}
134+
num_workers: *N_DATA_WORKERS

docs/datasets/tiger.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# TIGER (Tumor Infiltrating Lymphocytes in breast cancER)
2+
3+
TIGER contains digital pathology images of Her2 positive (Her2+) and Triple Negative (TNBC) breast cancer whole-slide images, together with manual annotations. Training data comes from multiple sources. A subset of Her2+ and TNBC cases is provided by the Radboud University Medical Center (RUMC) (Nijmegen, Netherlands). A second subset of Her2+ and TNBC cases is provided by the Jules Bordet Institut (JB) (Bruxelles, Belgium). A third subset of TNBC cases only is derived from the TCGA-BRCA archive obtained from the Genomic Data Commons Data Portal.
4+
5+
It contains 3 different datasets and thus 3 different tasks to add to eva. However only two are currently added.
6+
7+
WSIBULK - WSI level classification task: Detecting tumour presence in patches of a given slide.
8+
WSITILS - Regression task: predicting "TIL" score of a whole slide image.
9+
10+
Source: https://tiger.grand-challenge.org/Data/
11+
12+
13+
## Raw data
14+
15+
### Key stats
16+
17+
| | |
18+
|---------------------------|----------------------------------------------------------|
19+
| **Modality** | Vision (WSI) |
20+
| **Tasks** | Binary Classification / Regression |
21+
| **Cancer type** | Breast |
22+
| **Data size** | 182 GB |
23+
| **Image dimension** | ~20k x 20k x 3 |
24+
| **Magnification (μm/px)** | 20x (0.5) - Level 0 |
25+
| **Files format** | `.tif` |
26+
| **Number of images** | 178 WSIs (96 for WSIBULK and 82 for WSITILS) |
27+
28+
29+
### Organization
30+
31+
The data `tiger.zip` from [grand challenge](https://tiger.grand-challenge.org/) is organized as follows:
32+
33+
training/
34+
|_wsibulk/ (used for classification task)
35+
| |__annotations-tumor-bulk/ * manual annotations of "tumor bulk" regions (see https://tiger.grand-challenge.org/Data/ for details)
36+
| | |___masks/ * annotations in multiresolution TIF format
37+
| | |___xmls/ (not used in eva)
38+
| |__images/
39+
40+
| |__tissue-masks/ (not used in eva)
41+
|
42+
|_wsirois/ (not used in eva yet)
43+
|
44+
|_wsitils/ (used for regression task)
45+
| |__images/
46+
| │ ├── 104S.tiff
47+
│ | └── ... * whole-slide images
48+
| |__tissue-masks/ (not used in eva)
49+
| |__tiger-tils-scores-wsitils.csv (target variable file)
50+
51+
52+
53+
54+
## Download and preprocessing
55+
56+
The `TIGER` dataset class doesn't download the data during runtime and must be downloaded manually as follows:
57+
58+
- Make sure that the latest version of the AWS CLI is installed on your system by following [these instructions](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
59+
60+
With the AWS CLI installed, you can download the public training set (no AWS account required) by running:
61+
62+
`aws s3 cp s3://tiger-training/ /path/to/destination/ --recursive --no-sign-request`
63+
64+
65+
We then generate random stratified train / validation and test splits using a 0.7 / 0.15 / 0.15 ratio.
66+
67+
68+
69+

src/eva/vision/data/datasets/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
GleasonArvaniti,
1212
PANDASmall,
1313
PatchCamelyon,
14+
TIGERTumour,
1415
UniToPatho,
1516
WsiClassificationDataset,
1617
)
@@ -49,4 +50,5 @@
4950
"VisionDataset",
5051
"MultiWsiDataset",
5152
"WsiDataset",
53+
"TIGERTumour",
5254
]

src/eva/vision/data/datasets/classification/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from eva.vision.data.datasets.classification.mhist import MHIST
1010
from eva.vision.data.datasets.classification.panda import PANDA, PANDASmall
1111
from eva.vision.data.datasets.classification.patch_camelyon import PatchCamelyon
12+
from eva.vision.data.datasets.classification.tiger_tumour import TIGERTumour
1213
from eva.vision.data.datasets.classification.unitopatho import UniToPatho
1314
from eva.vision.data.datasets.classification.wsi import WsiClassificationDataset
1415

@@ -26,4 +27,5 @@
2627
"PANDA",
2728
"PANDASmall",
2829
"Camelyon16",
30+
"TIGERTumour",
2931
]

0 commit comments

Comments
 (0)