Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions configs/vision/pathology/offline/classification/tiger_tumour.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
trainer:
class_path: eva.Trainer
init_args:
n_runs: &N_RUNS ${oc.env:N_RUNS, 20}
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:MODEL_NAME, dino_vits16}/offline/tiger_tumour}
max_epochs: &MAX_EPOCHS ${oc.env:MAX_EPOCHS, 100}
checkpoint_type: ${oc.env:CHECKPOINT_TYPE, best}
callbacks:
- class_path: eva.callbacks.ConfigurationLogger
- class_path: lightning.pytorch.callbacks.TQDMProgressBar
init_args:
refresh_rate: ${oc.env:TQDM_REFRESH_RATE, 1}
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
init_args:
logging_interval: epoch
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
filename: best
save_last: ${oc.env:SAVE_LAST, false}
save_top_k: 1
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/BinaryBalancedAccuracy}
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max}
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
min_delta: 0
patience: ${oc.env:PATIENCE, 20}
monitor: *MONITOR_METRIC
mode: *MONITOR_METRIC_MODE
- class_path: eva.callbacks.ClassificationEmbeddingsWriter
init_args:
output_dir: &DATASET_EMBEDDINGS_ROOT ${oc.env:EMBEDDINGS_ROOT, ./data/embeddings/${oc.env:MODEL_NAME, dino_vits16}/tiger_tumour}
dataloader_idx_map:
0: train
1: val
2: test
metadata_keys: ["wsi_id"]
backbone:
class_path: eva.vision.models.ModelFromRegistry
init_args:
model_name: ${oc.env:MODEL_NAME, universal/vit_small_patch16_224_dino}
model_extra_kwargs: ${oc.env:MODEL_EXTRA_KWARGS, null}
overwrite: false
logger:
- class_path: lightning.pytorch.loggers.TensorBoardLogger
init_args:
save_dir: *OUTPUT_ROOT
name: ""
model:
class_path: eva.HeadModule
init_args:
head:
class_path: eva.vision.models.networks.ABMIL
init_args:
input_size: ${oc.env:IN_FEATURES, 384}
output_size: &NUM_CLASSES 1
projected_input_size: 128
criterion: torch.nn.BCEWithLogitsLoss
optimizer:
class_path: torch.optim.AdamW
init_args:
lr: ${oc.env:LR_VALUE, 0.001}
betas: [0.9, 0.999]
metrics:
common:
- class_path: eva.metrics.AverageLoss
- class_path: eva.metrics.BinaryClassificationMetrics
data:
class_path: eva.DataModule
init_args:
datasets:
train:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args: &DATASET_ARGS
root: *DATASET_EMBEDDINGS_ROOT
manifest_file: manifest.csv
split: train
embeddings_transforms:
class_path: eva.core.data.transforms.Pad2DTensor
init_args:
pad_size: &N_PATCHES ${oc.env:N_PATCHES, 200}
target_transforms:
class_path: eva.core.data.transforms.dtype.ArrayToFloatTensor
val:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: val
test:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: test
predict:
- class_path: eva.vision.datasets.TIGERTumour
init_args: &PREDICT_DATASET_ARGS
root: ${oc.env:DATA_ROOT, ./data/training/wsibulk}
sampler:
class_path: eva.vision.data.wsi.patching.samplers.ForegroundGridSampler
init_args:
max_samples: *N_PATCHES
width: 224
height: 224
target_mpp: 0.5
split: train
coords_path: ${data.init_args.datasets.train.init_args.root}/coords_${.split}.csv
image_transforms:
class_path: eva.vision.data.transforms.common.ResizeAndCrop
init_args:
size: ${oc.env:RESIZE_DIM, 224}
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]}
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]}
- class_path: eva.vision.datasets.TIGERTumour
init_args:
<<: *PREDICT_DATASET_ARGS
split: val
- class_path: eva.vision.datasets.TIGERTumour
init_args:
<<: *PREDICT_DATASET_ARGS
split: test
dataloaders:
train:
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 32}
num_workers: &N_DATA_WORKERS ${oc.env:N_DATA_WORKERS, 4}
shuffle: true
val:
batch_size: *BATCH_SIZE
num_workers: *N_DATA_WORKERS
test:
batch_size: *BATCH_SIZE
num_workers: *N_DATA_WORKERS
predict:
batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64}
num_workers: *N_DATA_WORKERS
69 changes: 69 additions & 0 deletions docs/datasets/tiger.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# TIGER (Tumor Infiltrating Lymphocytes in breast cancER)

TIGER contains digital pathology images of Her2 positive (Her2+) and Triple Negative (TNBC) breast cancer whole-slide images, together with manual annotations. Training data comes from multiple sources. A subset of Her2+ and TNBC cases is provided by the Radboud University Medical Center (RUMC) (Nijmegen, Netherlands). A second subset of Her2+ and TNBC cases is provided by the Jules Bordet Institut (JB) (Bruxelles, Belgium). A third subset of TNBC cases only is derived from the TCGA-BRCA archive obtained from the Genomic Data Commons Data Portal.

It contains 3 different datasets and thus 3 different tasks to add to eva. However only two are currently added.

WSIBULK - WSI level classification task: Detecting tumour presence in patches of a given slide.
WSITILS - Regression task: predicting "TIL" score of a whole slide image.

Source: https://tiger.grand-challenge.org/Data/


## Raw data

### Key stats

| | |
|---------------------------|----------------------------------------------------------|
| **Modality** | Vision (WSI) |
| **Tasks** | Binary Classification / Regression |
| **Cancer type** | Breast |
| **Data size** | 182 GB |
| **Image dimension** | ~20k x 20k x 3 |
| **Magnification (μm/px)** | 20x (0.5) - Level 0 |
| **Files format** | `.tif` |
| **Number of images** | 178 WSIs (96 for WSIBULK and 82 for WSITILS) |


### Organization

The data `tiger.zip` from [grand challenge](https://tiger.grand-challenge.org/) is organized as follows:

training/
|_wsibulk/ (used for classification task)
| |__annotations-tumor-bulk/ * manual annotations of "tumor bulk" regions (see https://tiger.grand-challenge.org/Data/ for details)
| | |___masks/ * annotations in multiresolution TIF format
| | |___xmls/ (not used in eva)
| |__images/

| |__tissue-masks/ (not used in eva)
|
|_wsirois/ (not used in eva yet)
|
|_wsitils/ (used for regression task)
| |__images/
| │ ├── 104S.tiff
│ | └── ... * whole-slide images
| |__tissue-masks/ (not used in eva)
| |__tiger-tils-scores-wsitils.csv (target variable file)




## Download and preprocessing

The `TIGER` dataset class doesn't download the data during runtime and must be downloaded manually as follows:

- Make sure that the latest version of the AWS CLI is installed on your system by following [these instructions](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

With the AWS CLI installed, you can download the public training set (no AWS account required) by running:

`aws s3 cp s3://tiger-training/ /path/to/destination/ --recursive --no-sign-request`


We then generate random stratified train / validation and test splits using a 0.7 / 0.15 / 0.15 ratio.




Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove those blank lines

2 changes: 2 additions & 0 deletions src/eva/vision/data/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
GleasonArvaniti,
PANDASmall,
PatchCamelyon,
TIGERTumour,
UniToPatho,
WsiClassificationDataset,
)
Expand Down Expand Up @@ -49,4 +50,5 @@
"VisionDataset",
"MultiWsiDataset",
"WsiDataset",
"TIGERTumour",
]
2 changes: 2 additions & 0 deletions src/eva/vision/data/datasets/classification/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from eva.vision.data.datasets.classification.mhist import MHIST
from eva.vision.data.datasets.classification.panda import PANDA, PANDASmall
from eva.vision.data.datasets.classification.patch_camelyon import PatchCamelyon
from eva.vision.data.datasets.classification.tiger_tumour import TIGERTumour
from eva.vision.data.datasets.classification.unitopatho import UniToPatho
from eva.vision.data.datasets.classification.wsi import WsiClassificationDataset

Expand All @@ -26,4 +27,5 @@
"PANDA",
"PANDASmall",
"Camelyon16",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR but just saw that Camelyon16 appears twice

"TIGERTumour",
]
Loading