|Replica techniques can predict learning curves (Vol. 44 No. 1)|
Predicted mean square errors vs number of examples per vertex, for different noise levels σ2 (triangles), agree very well with numerical simulations (circles, graphs with 500 vertices) and improve significantly on existing approximations (dashed line).
We show that statistical physics approaches, in particular the replica method, can be used to accurately predict the learning curve of a Gaussian process (GP) inferring a function from noisy data, for a wide range of discrete input spaces. The learning curve quantifies performance as average mean square error versus number of training examples.
GPs are a popular Bayesian inference technique. A GP prior is placed over a function space, and combined with the likelihood of the observed data given a function. Bayes’ theorem then gives a posterior distribution over functions. For a likelihood describing Gaussian noise corrupting the observed function values, this is again a GP, which can be used to make predictions about the function. GPs are “non-parametric”: they effectively represent functions with infinitely many parameters. This makes analysis of their learning curves non-trivial, and much has been achieved for GPs learning functions whose inputs are real-valued. However, predictions are generally only qualitatively correct, with exact solutions only for special cases. We show for the case where inputs are discrete, specifically vertices on a random graph, that replica techniques can be used to predict learning curves exactly in the limit of large graphs.
The starting point is to represent the average error as the derivative of a partition function. We rewrite this so that only neighbouring vertices are directly coupled. From here one can apply the replica method to find the required quenched average over the randomness in the data set. The results apply to random graph ensembles constrained by any fixed degree distribution, and can be generalised to more complicated ensembles.
M. J. Urry and P. Sollich, ‘Replica theory for learning curves for Gaussian processes on random graphs’, J. Phys. A: Math. Theor. 45, 425005 (2012)