\(\text{“I can’t believe it’s not Bayesian”}\) \(\tiny \text{- Chelsea Finn, ICML 2019 meta-learning workshop}\)
Figure 2: Hyperparameter searches (image source)
The salient characteristic of contemporary neural-network meta-learning is an explicitly defined meta-level objective, and end-to-end optimization of the inner algorithm with respect to this objective hospedales20_meta_learn_neural_networ
Figure 3: Conventional vs meta-learning for 1D function regression (image source)
Figure 4: “3-way-2-shot” (few-shot) classification. Each source task’s train set contains 3 classes of 2 examples each (Vinyals et al., 2016). Image modified from Borealis AI blogpost.
Figure 5: Visualized MAML (image modified from BAIR blogpost).
Figure 7: Rapid learning entails efficient but significant change from \(\color{blue}{\omega^*}\) to \(\theta^*\); feature reuse is where \(\color{blue}{\omega^*}\) already provides high quality representations. Figure 1 from the paper.
Conventional ML: fixed/pre-specified \(\color{blue}{\omega}\)
Shared meta-knowledge \(\color{blue}{\omega}\) | Task-specific \(\theta\) | |
---|---|---|
NN | Hyperparameters (e.g. learning rate, weight initialization scheme, optimizer, architecture design) | Network weights |
Meta-learning: learnt \(\color{blue}{\omega}\) from \(\mathcal{D}_{\mathrm{source}}\)
Shared meta-knowledge \(\color{blue}{\omega}\) | Task-specific \(\theta\) | |
---|---|---|
Hyperopt | Hyperparameters (e.g. learning rate) | Network weights |
MAML | Network weights (initialization learnt from \(\mathcal{D}_{\mathrm{source}}\)) | Network weights (tuned on \(\mathcal{D}_{\mathrm{target}}\)) |
NP | Network weights | Aggregated target context [latent] representation |
Meta-GP | Deep mean/kernel function parameters | None (a GP is fit on \(\mathcal{D}_{\mathrm{target}}^{\mathrm{train}}\)) |
Figure 8: MAML and its corresponding probabilistic graphical model. Figure 2 from Griffiths et al. (2019).
Figure 9: Same at meta-train \(\left(\mathcal{D}_{\text {source}}^{\text {train}}, \mathcal{D}_{\text {source}}^{\text {val}}\right)\) and meta-test \(\left(\mathcal{D}_{\text {target}}^{\text {train}}, \mathcal{D}_{\text {target}}^{\text {test}}\right)\) time.
Figure 10: In an NP, meta-parameters \(\color{blue}{\omega}\) are the weights of the encoder and decoder NNs.
Figure 11: Figure 1 from Fortuin et al. (2019). Corresponding GPFlow code in purple.
Figure 13: Modified from Figure 1 in Hospedales et al. (2020).
To recap, conventional ML: fixed/pre-specified \(\color{blue}{\omega}\)
Shared meta-knowledge \(\color{blue}{\omega}\) | Task-specific \(\theta\) | |
---|---|---|
NN | Hyperparameters (e.g. learning rate, weight initialization scheme, optimizer, architecture design) | Network weights |
Meta-learning: learnt \(\color{blue}{\omega}\) from \(\mathcal{D}_{\mathrm{source}}\)
Shared meta-knowledge \(\color{blue}{\omega}\) | Task-specific \(\theta\) | |
---|---|---|
Hyperopt | Hyperparameters (e.g. learning rate) | Network weights |
MAML | Network weights (initialization learnt from \(\mathcal{D}_{\mathrm{source}}\)) | Network weights (tuned on \(\mathcal{D}_{\mathrm{target}}\)) |
NP | Network weights | Aggregated target context [latent] representation |
Meta-GP | Deep mean/kernel function parameters | None (a GP is fit on \(\mathcal{D}_{\mathrm{target}}^{\mathrm{train}}\)) |