ICML 2019 Meta-Learning Tutorial (Video part 1)
Types π
There are three common approaches to meta-learning: metric-based, model-based, and optimization-based. Lilian Wengβs Meta-Learning: Learning to Learn Fast covers them in great detail. As Finn in Learning to Learn β The Berkeley Artificial Intelligence Research Blog states, there are two optimizations at play β the learner, which learns new tasks, and the meta-learner, which trains the learner. Methods for meta-learning have typically fallen into one of three categories: recurrent models, metric learning, and learning optimizers.
Model-based, e.g. recurrent models π
The meta-learner uses gradient descent, whereas the learner simply rolls out the recurrent network. This approach is one of the most general approaches and has been used for few-shot classification and regression, and meta-reinforcement learning. Due to its flexibility, this approach also tends to be less (meta-)efficient than other methods because the learner network needs to come up with its learning strategy from scratch.
Metric-based π
meta-learning is performed using gradient descent, whereas the learner corresponds to a comparison scheme, e.g. nearest neighbors, in the meta-learned metric space. These approaches work quite well for few-shot classification, though they have yet to be demonstrated in other meta-learning domains such as regression or reinforcement learning.
Motivations π
Human motivation π
βHumans have a remarkable ability to quickly grasp new concepts from a very small number of examples or a limited amount of experience, leveraging prior knowledge and context.β
Comparison to supervised DL π
- In deep supervised learning, we have large, diverse datasets (and large models) from which we make broad generalizations. Butβ¦
- What if you have a small dataset, e.g. personalized education, recommendations, rare language translations, medical image, robotics
- What if you need an AI system in the real world, that continuously adapts and learns on the job?
- What if your data has a long tail?
- All these settings break the supervised learning paradigm.
Goal π
Two views of meta-learning π
Applying Probabilistic view to existing algos π
Perhaps like ML-PIP (Jonathan Gordon et al., 2019)?
parametric model
β why not nonparametric? we could stop here and treat meta learning as nonparametric model, but if we want to learn a high capacity model we can train a parametric model β
Terminology π
- Distribution over tasks and we assume all meta-training and meta-testing tasks come from that distribution.
- What does this mean in terms of structural similarity between tasks? Same piece of code/same natural process generates these tasks.
Datasets π
- Omniglot is like the transpose of ImageNet.
- miniImageNet for few-shot image classification
Meta-learning vs multitask learning, transfer learning π
from (Wei et al., 2018)
Furthermore, transfer learning and multi-task learning typically have large dataset sizes for each task, whereas meta-learning has small task-specific datasets. Meta-learning also tends to have a larger number of tasks.
Bibliography
Gordon, J., Bronskill, J., Bauer, M., Nowozin, S., & Turner, R., Meta-learning probabilistic inference for prediction, In , International Conference on Learning Representations (pp. ) (2019). : . β©
Wei, Y., Zhang, Y., Huang, J., & Yang, Q., Transfer learning via learning to transfer, In J. Dy, & A. Krause, Proceedings of the 35th International Conference on Machine Learning (pp. 5085β5094) (2018). Stockholmsm{\“a}ssan, Stockholm Sweden: PMLR. β©