|Neural networks||Gaussian Processes|
|1||\(\checkmark\) Adaptive basis functions with effective inductive biases||\(\crossmark\) Fixed basis functions|
|2||\(\checkmark\) Fast inference||\(\crossmark\) Expensive inference|
|3||\(\crossmark\) Needs a lot of training data||\(\checkmark\)|
|4||\(\crossmark\) Finite basis functions||\(\checkmark\) Infinite basis functions, non-parametric flexibility|
|5||\(\crossmark\) Inflexible, learns single function||\(\checkmark\) Learns distribution over functions, estimates uncertainty|
- Prior knowledge can only be specified in rather limited ways, e.g. architecture
- NNs need to be trained from scratch for new functions, whereas GPs allow reasoning about multiple functions
Where neural networks used finitely many highly adaptive basis functions, Gaussian processes typically used infinitely many fixed basis functions.
As argued by MacKay (1998), According to the hype of 1987, neural networks were meant to be intelligent models which discovered features and patterns in data. Gaussian processes in contrast are simply smoothing devices.
Recent approaches (e.g., Wilson, 2014; Wilson and Adams, 2013; Lloyd et al., 2014; Yang et al., 2015) have demonstrated that one can develop more expressive kernel functions, which are indeed able to discover rich structure in data without human intervention. Such methods effectively use infinitely many adaptive basis functions. The relevant question then becomes not which paradigm (e.g., kernel methods or neural networks) replaces the other, but whether we can combine the advantages of each approach. Indeed, deep neural networks provide a powerful mechanism for creating adaptive basis functions, with induc- tive biases which have proven effective for learning in many application domains, including visual object recognition, speech perception, language understanding, and information re- trieval