Deep learning has been successfully applied to learn non-linear feature mappings and to perform dimensionality reduction. In this paper, we present supervised embedding techniques that use a deep neural network to collapse classes. The network is pre-trained using a stack of Restricted Boltzmann Machines (RBMs), and finetuned using approaches that try to collapse classes. The finetuning is inspired by ideas from Neighborhood Components Analysis (NCA), but it uses a Student t-distribution to model the probabilities of pairwise data points belonging to the same class in the embedding. We investigate two types of objective functions: deep t-distributed MCML (dt-MCML) and deep t-distributed NCA (dt-NCA). Our experiments on two handwritten digit datasets reveal the strong performance of dt-MCML in supervised parametric data visualization, whereas dt-NCA outperforms alternative techniques when embeddings with more than two or three dimensions are constructed, e.g., to obtain good classification performances. Overall, our results demonstrate the advantage of using a deep architecture and a heavy-tailed t-distribution for measuring pairwise similarities in supervised embedding.
Download PDF