Skip to content

iLearn-Lab/ToMM23-VFD

Repository files navigation

Voice-Face Homogeneity Tells Deepfake

Detects deepfakes by exploiting the natural identity-level homogeneity between voice and face — a cross-modal consistency that deepfake generation breaks.

Authors

Harry Cheng1, Yangyang Guo2*, Tianyi Wang3, Qi Li1, Xiaojun Chang4, Liqiang Nie5*

1 School of Computer Science and Technology, Shandong University 2 School of Computing, National University of Singapore 3 Department of Computer Science, The University of Hong Kong 4 Faculty of Engineering and Information Technology, University of Technology Sydney 5 Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) * Corresponding authors

Links


Table of Contents


Updates

  • [04/2026] Transfer repos to iLearn-Lab
  • [2023] Paper published in ACM Transactions on Multimedia Computing, Communications and Applications (ToMM), Vol. 20, Issue 3

Introduction

This repository is the official implementation of Voice-Face Homogeneity Tells Deepfake, published in ACM ToMM 2023.

Real videos exhibit a natural identity-level homogeneity between a person's voice and face — their vocal and visual characteristics are correlated through shared identity. Deepfake generation typically manipulates only one modality, breaking this natural cross-modal consistency.

VFD (Voice-Face Deepfake detection) detects deepfakes by measuring the matching degree between the voice and face in a video clip. A mismatch signals a potential forgery.


Highlights

  • Exploits voice-face identity homogeneity as a natural, annotation-free detection signal
  • Detects audio-visual deepfakes across DFDC, DF-TIMIT, and FakeAVCeleb
  • Provides pretrained checkpoints for DFDC and FakeAVCeleb

Method / Framework

VFD trains a cross-modal matching model to determine whether the voice and face in a video clip belong to the same identity. Real videos produce high consistency scores; deepfakes that manipulate one modality produce a detectable mismatch.


Project Structure

.
├── FaceModel/              # Face feature extraction model
├── configs/                # Dataset-specific configuration files
│   ├── DFDC/
│   └── FakeAVCeleb/
├── datasets/               # Dataset class definitions
├── lists/                  # Annotation list files (train/test splits)
├── utils/                  # Utility functions
├── finetune_deepfake.py    # Fine-tuning script
├── pretrain_general.py     # General pretraining script
├── test.py                 # Testing script
├── test_vfd.py             # Main evaluation script
└── README.md

Checkpoints / Models

Download checkpoints and place into ./exp/[Dataset]/:


Dataset / Benchmark

Supports DFDC, DF-TIMIT, and FakeAVCeleb. Steps:

1. Download datasets

Download the original datasets from their official sources.

2. Extract frames and audio

Extract frames and audio, and organize annotation files under ./lists/[Dataset]/:

/data/FakeAVCeleb/test/face/RealVideo-RealAudio/African/women/id04245/00001.jpg 0
/data/FakeAVCeleb/test/voice/RealVideo-RealAudio/African/women/id04245/00001.wav 0

Format: <file_path> <label>, where label is 0 (real) or 1/2/3 (fake).


Usage

Testing

python test_vfd.py --config ./configs/DFDC/test.yaml
python test_vfd.py --config ./configs/FakeAVCeleb/test.yaml

TODO

  • Add training script documentation
  • Release DF-TIMIT configuration and checkpoint

Citation

If you find our paper useful, please cite:

@article{cheng2023voice,
  title={Voice-face homogeneity tells deepfake},
  author={Cheng, Harry and Guo, Yangyang and Wang, Tianyi and Li, Qi and Chang, Xiaojun and Nie, Liqiang},
  journal={ACM Transactions on Multimedia Computing, Communications and Applications},
  volume={20},
  number={3},
  pages={1--22},
  year={2023},
}

Acknowledgement

  • Thanks to the creators of DFDC, DF-TIMIT, and FakeAVCeleb for making their datasets available.
  • Thanks to our supervisors and collaborators for their support.

License

This project is released under the Apache License 2.0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages