Code for ECMLPKDD 2019 Paper: A Framework for Deep Constrained Clustering - Algorithms and Advances
git clone https://github.com/blueocean92/deep_constrained_clustering
cd deep_constrained_clustering
Python: see requirement.txt for complete list of used packages. We recommend doing a clean installation of requirements using virtualenv:
conda create -n testenv python=3.6
source activate testenv
pip install -r requirements.txt If you dont want to do the above clean installation via virtualenv, you could also directly install the requirements through:
pip install -r requirements.txt --no-indexPyTorch: Note that you need PyTorch. We used Version 1.0.0 If you use the above virtualenv, PyTorch will be automatically installed therein.
While in deep_constrained_clustering folder:
sh download_model.sh
Step 2: Download Processed Reuters Data(optional, MNIST and Fashion is available in torchvision.datasets)
sh download_data.sh
cd experiments/
While in deep_constrained_clustering/experiments folder:
To run the pairwise constrained clustering using pre-trained weights (AE features, 6000 constraints), do:
python run_DCC_pairwise.py --data $DATAFor the --data flag which specifies the data set being used, the options are "MNIST", "Fashion" and "Reuters".
To run the pairwise without constrained clustering from raw features, do:
python run_DCC_pairwise.py --data $DATA --without_pretrainTo run the pairwise without KMeans initialization, do:
python run_DCC_pairwise.py --data $DATA --without_kmeansTo run the pairwise constrained clustering with noisy pairwise constraints do:
python run_DCC_pairwise.py --data $DATA --noisy $NOISEFor the --noisy flag which specifies the noisy degree, the option should be a positive float equal to the ratio of noisy constraints to ground truth constraints.
To save data for plotting, do:
python run_DCC_pairwise.py --data $DATA --plottingThis will save the experiment data for plotting in folders under ./plotting
To plot the results, do:
python ./plotting/plot_pairwise.pyTo run the instance difficulty constrained clustering, do:
python run_DCC_instance.py --data $DATATo run the triplets constrained clustering (6000 constraints), do:
python run_DCC_triplets.py --data $DATATo run the global size constrained clustering, do:
python run_DCC_global.py --data $DATATo run the baseline Improved DEC, do:
python run_improved_DEC.py --data $DATA