We present a probabilistic model for clustering of objects represented via pairwise dissimilarities. We propose that even if an underlying vectorial representation exists, it is better to work directly with the dissimilarity matrix hence avoiding unnecessary bias and variance caused by vector space embeddings. By using a Dirichlet process prior we are not obliged to fix the number of clusters in advance. Furthermore, our clustering model is invariant against label- and object-permutations, scale transformations and translations. The proposed model is called the Translation-invariant Wishart Dirichlet (TIWD) process. On the algorithmic side, we present a highly efficient MCMC sampling algorithm which avoids costly matrix operations. Experiments on both synthetic and real-world data show that the TIWD process exhibits several advantages over competing approaches.
Download PDF