Self-supervised patch-level model with a shared latent z, reconstruction decoder, and contrastive objective with EMA target encoder/projector.
- src/data.py
- src/models.py
- src/train.py
- src/eval.py
- checkpoints/
- logs/
- Python 3.9–3.11 recommended. GPU optional but recommended.
- Install deps:
- PowerShell:
python -m pip install -r requirements.txt
- PowerShell:
- Place images under
data/or any folder. Supported by TensorFlow decode (jpg/png/etc.). - Images are resized to 1024×1024 on-the-fly.
- From each image,
num_patches_per_imagerandom 16×16 patches are sampled.
Run from the project root using the module form (so imports resolve):
python -m src.train --file_pattern "data/*" --epochs 2 --batch_size 64 --cacheExamples (Windows globbing):
- All jpg recursively:
--file_pattern "data/**/*.jpg" - Single folder:
--file_pattern "data/*"
Key flags:
--patch_size 16(default)--resize 1024--latent_dim 256--proj_dim 128--pred_hidden 128--tau 0.2--lambda_contrast 0.2--momentum 0.996--batch_size 256(reduce if OOM)--mixed_precisionto enable float16 policy on supported GPUs
TensorBoard:
tensorboard --logdir logspython -m src.eval --file_pattern "data/*" --max_steps 50Prints MSE/PSNR/SSIM and a simple Retrieval@1 on two augmented views.
- For large datasets, consider creating TFRecords of resized images.
- EMA momentum can be annealed upward during training.
- If recon loss is high, consider milder augs or larger
latent_dim.