Skip to content

Latest commit

 

History

History
169 lines (113 loc) · 5.37 KB

File metadata and controls

169 lines (113 loc) · 5.37 KB

Voice-Face Homogeneity Tells Deepfake

Detects deepfakes by exploiting the natural identity-level homogeneity between voice and face — a cross-modal consistency that deepfake generation breaks.

Authors

Harry Cheng1, Yangyang Guo2*, Tianyi Wang3, Qi Li1, Xiaojun Chang4, Liqiang Nie5*

1 School of Computer Science and Technology, Shandong University 2 School of Computing, National University of Singapore 3 Department of Computer Science, The University of Hong Kong 4 Faculty of Engineering and Information Technology, University of Technology Sydney 5 Department of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) * Corresponding authors

Links


Table of Contents


Updates

  • [04/2026] Transfer repos to iLearn-Lab
  • [2023] Paper published in ACM Transactions on Multimedia Computing, Communications and Applications (ToMM), Vol. 20, Issue 3

Introduction

This repository is the official implementation of Voice-Face Homogeneity Tells Deepfake, published in ACM ToMM 2023.

Real videos exhibit a natural identity-level homogeneity between a person's voice and face — their vocal and visual characteristics are correlated through shared identity. Deepfake generation typically manipulates only one modality, breaking this natural cross-modal consistency.

VFD (Voice-Face Deepfake detection) detects deepfakes by measuring the matching degree between the voice and face in a video clip. A mismatch signals a potential forgery.


Highlights

  • Exploits voice-face identity homogeneity as a natural, annotation-free detection signal
  • Detects audio-visual deepfakes across DFDC, DF-TIMIT, and FakeAVCeleb
  • Provides pretrained checkpoints for DFDC and FakeAVCeleb

Method / Framework

VFD trains a cross-modal matching model to determine whether the voice and face in a video clip belong to the same identity. Real videos produce high consistency scores; deepfakes that manipulate one modality produce a detectable mismatch.


Project Structure

.
├── FaceModel/              # Face feature extraction model
├── configs/                # Dataset-specific configuration files
│   ├── DFDC/
│   └── FakeAVCeleb/
├── datasets/               # Dataset class definitions
├── lists/                  # Annotation list files (train/test splits)
├── utils/                  # Utility functions
├── finetune_deepfake.py    # Fine-tuning script
├── pretrain_general.py     # General pretraining script
├── test.py                 # Testing script
├── test_vfd.py             # Main evaluation script
└── README.md

Checkpoints / Models

Download checkpoints and place into ./exp/[Dataset]/:


Dataset / Benchmark

Supports DFDC, DF-TIMIT, and FakeAVCeleb. Steps:

1. Download datasets

Download the original datasets from their official sources.

2. Extract frames and audio

Extract frames and audio, and organize annotation files under ./lists/[Dataset]/:

/data/FakeAVCeleb/test/face/RealVideo-RealAudio/African/women/id04245/00001.jpg 0
/data/FakeAVCeleb/test/voice/RealVideo-RealAudio/African/women/id04245/00001.wav 0

Format: <file_path> <label>, where label is 0 (real) or 1/2/3 (fake).


Usage

Testing

python test_vfd.py --config ./configs/DFDC/test.yaml
python test_vfd.py --config ./configs/FakeAVCeleb/test.yaml

TODO

  • Add training script documentation
  • Release DF-TIMIT configuration and checkpoint

Citation

If you find our paper useful, please cite:

@article{cheng2023voice,
  title={Voice-face homogeneity tells deepfake},
  author={Cheng, Harry and Guo, Yangyang and Wang, Tianyi and Li, Qi and Chang, Xiaojun and Nie, Liqiang},
  journal={ACM Transactions on Multimedia Computing, Communications and Applications},
  volume={20},
  number={3},
  pages={1--22},
  year={2023},
}

Acknowledgement

  • Thanks to the creators of DFDC, DF-TIMIT, and FakeAVCeleb for making their datasets available.
  • Thanks to our supervisors and collaborators for their support.

License

This project is released under the Apache License 2.0.