Skip to content

[IJCAI 2025] πŸ† The Champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI2025.

Notifications You must be signed in to change notification settings

momiji-bit/MM-Gesture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion

Jihao Gu1, Fei Wang2,5, Kun Li3 πŸ“§ , Yanyan Wei2, Zhiliang Wu3, and Dan Guo2,4,5

1 University College London (UCL), Gower Street, London, WC1E 6BT, UK
2School of Computer Science and Information Engineering, School of Artificial Intelligence, Hefei University of Technology (HFUT)
3ReLER, CCAI, Zhejiang University, China
4Key Laboratory of Knowledge Engineering with Big Data (HFUT), Ministry of Education
5Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China

πŸ†Champion Solution for Micro-gesture Classification in 3rd MiGA @ IJCAI 2025


πŸŽ‰ The generated ensemble/prediction.zip represents our final submission, achieving an impressive πŸ† Top-1 Accuracy of 73.213%! 🌟

framework

πŸ“š 0. Table of Contents

πŸ“¦ 1. Installation

git clone https://github.com/momiji-bit/MM-Gesture
cd MM-Gesture

πŸ“‚ 2. Data preparation

πŸ”½ 2.1 Download our pre-processed dataset (Recommend)

πŸ” To facilitate your access to our preprocessed video data, you can download it directly from HuggingFace.

πŸ” To comply with the dataset’s usage policy, we have restricted access to the processed files. Please request access through HuggingFace, and we will approve it promptly.

cd dataset 
pip install huggingface_hub
huggingface-cli login
# export HF_ENDPOINT=https://hf-mirror.com  # (Optional) For users in China, enable the mirror
mkdir -p ./iMiGUE_SRTFD
huggingface-cli download Geo2425/iMiGUE_SRTFD --repo-type dataset --local-dir ./iMiGUE_SRTFD
unzip ./iMiGUE_SRTFD/Skeleton.zip -d .
unzip ./iMiGUE_SRTFD/RGB.zip -d .
unzip ./iMiGUE_SRTFD/Taylor.zip -d .
unzip ./iMiGUE_SRTFD/Flow.zip -d .
unzip ./iMiGUE_SRTFD/Depth.zip -d .

mkdir RGB/clips
cp -r RGB/train/* RGB/clips
cp -r RGB/val/* RGB/clips
cp -r RGB/test/* RGB/clips

# rm -r ./iMiGUE_SRTFD
cd ..

βš™οΈ 2.2 Process dataset by yourself [Optional]

If you've already downloaded the preprocessed data, feel free to skip this step.

cd dataset
mkdir Skeleton RGB Taylor Flow Depth MiGA

2.2.1 Download MiGA'25 Official Dataset (Track 1)

Download here: Kaggle MiGA Challenge Track 1

You just need to download the following files:

  • 1️⃣ imigue_skeleton_phase1.zip β†’ imigue_data_phase1
  • 2️⃣ imigue_rgb_phase1.zip β†’ imigue_rgb_phase1
  • 3️⃣ imigue_skeleton_phase2.zip β†’ imigue_data_phase2 πŸ”’
  • 4️⃣ imigue_rgb_phase2.zip β†’ imigue_rgb_phase2 πŸ”’

Or use these commands to download and unzip:

cd MiGA

# πŸ‹οΈβ€β™‚οΈ Train and Validation dataset
wget https://miga3.a3s.fi/imigue_skeleton_phase1.zip
wget https://miga3.a3s.fi/imigue_rgb_phase1.zip
unzip imigue_skeleton_phase1.zip
unzip imigue_rgb_phase1.zip

# πŸ§ͺ Test dataset
# πŸ”’ Note: Links might expire based on organizer’s access policy.
wget https://miga3.a3s.fi/imigue_skeleton_phase2.zip
wget https://miga3.a3s.fi/imigue_rgb_phase2.zip
unzip imigue_skeleton_phase2.zip
unzip imigue_rgb_phase2.zip

2.2.2 Generate Skeleton Data

To generate the skeleton data, simply run the code provided in the Jupyter notebook:

Open and execute `dataset/tools/processing_Skeleton.ipynb`.

2.2.3 Generate RGB Videos

For RGB video generation, use the provided Jupyter notebook:

Open and execute `dataset/tools/processing_RGB.ipynb`.

2.2.4 Generate Taylor Videos

To generate Taylor-encoded videos:

cd ../tools

python taylor.py ../RGB/train ../Taylor/train
python taylor.py ../RGB/val ../Taylor/val
python taylor.py ../RGB/test ../Taylor/test

2.2.5 Generate Optical Flow Videos

We use memflow for optical flow generation.

  1. Setup Follow memflow’s official instructions to install dependencies and download pretrained models.
  2. Optimized Execution Use the custom script inference_mp4.py for efficient GPU utilization.
  3. Run the following commands:
python inference_mp4.py \
  --name MemFlowNet \
  --stage things \
  --restore_ckpt ckpts/MemFlowNet_things.pth \
  --input_dir ../../MiGA/RGB/train \
  --output_dir ../../MiGA/Flow/train

python inference_mp4.py \
  --name MemFlowNet \
  --stage things \
  --restore_ckpt ckpts/MemFlowNet_things.pth \
  --input_dir ../../MiGA/RGB/val \
  --output_dir ../../MiGA/Flow/val

python inference_mp4.py \
  --name MemFlowNet \
  --stage things \
  --restore_ckpt ckpts/MemFlowNet_things.pth \
  --input_dir ../../MiGA/RGB/test \
  --output_dir ../../MiGA/Flow/test

2.2.6 Generate Depth Videos

We use Video-Depth-Anything to generate depth videos.

  1. Setup Follow the official instructions to configure the environment and download pretrained models.
  2. Optimized Execution Use the custom script run_dir.py for efficient GPU utilization.
  3. Run the following commands:
# For training data
python3 run_dir.py \
  --input_dir ../../MiGA/RGB/train \
  --output_dir ../../MiGA/Depth/train \
  --encoder vits \
  --grayscale \
  --procs_per_gpu 2

# For validation data
python3 run_dir.py \
  --input_dir ../../MiGA/RGB/val \
  --output_dir ../../MiGA/Depth/val \
  --encoder vits \
  --grayscale \
  --procs_per_gpu 2

# For test data
python3 run_dir.py \
  --input_dir ../../MiGA/RGB/test \
  --output_dir ../../MiGA/Depth/test \
  --encoder vits \
  --grayscale \
  --procs_per_gpu 2

πŸ‹οΈβ€β™‚οΈ 3. Training & Testing

✨ Pre-trained models are available for download here. πŸ“₯🎯

Model (Size) Modality Link
PoseConv3D Joint Download
PoseConv3D Limb Download
PoseConv3D RGB+Joint Download
PoseConv3D RGB+Limb Download
VideoSwinT (Base/Small/Tiny) RGB Download
VideoSwinT (Small/Tiny) RGB* Download
VideoSwinT (Base/Small/Tiny) Taylor Download
VideoSwinT (Base) Optical Flow Download
VideoSwinT (Base/Small) Depth Download

3.1 PoseConv3D

# Install dependencies
conda env create -f pyskl_environment.yml -y
conda activate pyskl  # Or: source activate pyskl
cd pyskl

Then, run the code in pyskl/RUN.ipynb for training and testing.

3.2 VideoSwinT

# Install dependencies
conda env create -f openmmlab_environment.yml -y
conda activate openmmlab  # Or: source activate openmmlab
cd mmaction2

Then, run the code in mmaction2/RUN.ipynb for training and testing.

πŸ’₯ 4. Ensemble (Multi-modal Fusion)

We provide a script for combining six modalities (Joint, Limb, RGB, Taylor, Optical Flow, Depth) to leverage their complementary strengths and improve accuracy:

  • Run ensemble/ensemble.py to generate the final competition results.

πŸ™ 5. Acknowledgement

This code began with PYSKL and mmaction2 toolbox. We thank the developers for doing most of the heavy-lifting.

If you found this code useful, please consider citing:

@article{gu2025mm,
  title={MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion},
  author={Gu, Jihao and Wang, Fei and Li, Kun and Wei, Yanyan and Wu, Zhiliang and Guo, Dan},
  journal={arXiv preprint arXiv:2507.08344},
  year={2025}
}

@article{guo2024benchmarking,
  title={Benchmarking Micro-action Recognition: Dataset, Methods, and Applications},
  author={Guo, Dan and Li, Kun and Hu, Bin and Zhang, Yan and Wang, Meng},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2024},
  volume={34},
  number={7},
  pages={6238-6252}
}

@misc{2020mmaction2,
    title={OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark},
    author={MMAction2 Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmaction2}},
    year={2020}
}
 

πŸ“§ 6. Contact

For any questions, feel free to contact: Dr. Kun Li (kunli.hfut@gmail.com) and Mr. Jihao Gu (jihao.gu.23@ucl.ac.uk).

About

[IJCAI 2025] πŸ† The Champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •