Adaptive Self-paced Deep Clustering with Data Augmentation (ASPC-DA)
Tensorflow implementation for our paper:
Abstract
Deep clustering gains superior performance than conventional clustering by jointly performing feature learning and cluster
assignment. Although numerous deep clustering algorithms have emerged in various applications, most of them fail to learn robust
cluster-oriented features which in turn hurts the final clustering performance. To solve this problem, we propose a two-stage deep
clustering algorithm by incorporating data augmentation and self-paced learning. Specifically, in the first stage, we learn robust features
by training an autoencoder with examples that are augmented by random shifting and rotating the given clean examples. Then in the
second stage, we encourage the learned features to be cluster-oriented by alternatively finetuning the encoder with the augmented
examples and updating the cluster assignments of the clean examples. During finetuning the encoder, the target of each augmented
example in the loss function is the center of the cluster to which the clean example is assigned. The targets may be computed
incorrectly, and the examples with incorrect targets could mislead the encoder network. To stabilize the network training, we select
most confident examples in each iteration by utilizing the adaptive self-paced learning. Extensive experiments validate that our
algorithm outperforms the state of the arts on four image datasets.
Usage
1. Prepare environment
Install Anaconda with Python 3.6 version (Optional).
Create a new env (Optional):
conda create -n aspc python=3.6 -y
source activate aspc # Linux
# or
conda activate aspc # Windows
Install required packages:
pip install tensorflow-gpu==1.10 scikit-learn h5py
2. Clone the code and prepare the datasets.
git clone https://github.com/XifengGuo/ASPC-DA.git ASPC-DA
cd ASPC-DA
3. Run experiments.
python run_exp.py