Bayesian Kernel Two-Sample Testing

Published in Journal of Computational and Graphical Statistics, 2022

Recommended citation: Qinyi Zhang, Veit Wild, Sarah Filippi, Seth Flaxman and Dino Sejdinovic. "Generalized Variational Inference in Function Spaces: Gaussian Measures meet Bayesian Deep Learning." Journal of Computational and Graphical Statistics.

In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modelling the difference between kernel mean embeddings in the reproducing kernel Hilbert space utilising the framework established by Flaxman et al. (2016). The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e. testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach.

Download paper here