Unsupervised Clustering and Feature Learning with HPC

Nina, Oliver (AFRL/RYAT)

Dustin Angerhofer

Intersection of Digital Engineering and High Performance Computing/High End Computing

Unsupervised clustering is a very relevant open area of research in machine learning with many applications in the real world. Learning the manifold in which images lie and measuring the proximity distance of the sample points to the clusters in their latent space is non-trivial. Recent deep learning methods have proposed the use of autoencoders for manifold learning and dimensionality reduction in an effort to better cluster image samples. However, offline training of autoencoders is cumbersome and rather tedious to update. Moreover, trained autoencoders tend to be biased towards the training set and are impractical for performing data augmentation. In this paper, we introduce a novel method that uses a triplet network architecture in order to avoid the need of pre-trained autoencoders. Because our framework can be trained online, we can train our network with data augmented pairs which allows us to build a more robust encoder and improve accuracy. In contrast to other clustering methods that require nearest neighbor comparisons at every step, our method introduces a novel approach for selecting random training samples pairs with an adaptive metric distance.