Neural Ordinary Differential Equations π°π¦ π
Presented by Christabella Irwanto
Available as slides
Neural ODE π
- A new model class (Ricky Chen et al., 2018), can be used as
- Continuous-depth residual networks
- Continuous-time latent variable models
- Propose continuous normalizing flows, generative model
- Scalable backpropagation through ODE solver
- Paper shows various proofs of concept
ODE? π π
βIn the 300 years since Newton, mankind has come to realize that the laws of physics are always expressed in the language of differential equations.β
- Steven Strogatz
- An ODE is an equation involving an unknown function \(y=f(t)\) and at least one of its derivatives \(yβ, y”\) etc.
- Univariate functions (βtimeβ) vs partial DEβs with multivariate input
- Solve ODE by finding satisfying function \(f\)
- Useful whenever itβs easier to describe change than absolute quantity, e.g. in many dynamical systems (β’, π, π)
- E.g. radioactive decay, kinematic systems, or drug concentration in a body, over time
Bunny example π π

- Model system of dynamics with first-order ODE, \(B’(t) = \frac{dB}{dt} = rB\)
- \(B(t)\): bunny population at time \(t\)
- \(r\): growth rate of new bunnies for every bunny, per \(\Delta t\)
Visualize ODE π

Slope field of derivative for each \((B, t)\)
Solve ODE π

- Analytical solution via integration is \(B(t) = B(t_0)e^{rt}\)
- Known initial value \(B(t_0)\)
- Infinitely many solution, but generally only one satisfying initial conditions
Numerical ODE solver π
- Not all ODEβs have a closed form function satisfying it
- Even if so, itβs very hard finding solutions
- We have to numerically solve the ODE
- Eulerβs method, Runge-Kutta methods
Enter neural networks π
- Regular neural networks transform input with a series of functions \(f\),
\(\mathbf{h}_{t+1} = f(\mathbf{h}_t)\)
- Each layer introduces error that compounds
- Mitigate this by adding more layers, and limit the complexity of each step
- Infinite layers, with infinitesimal step-changes?
Resnet π
- Instead of \(\mathbf{h}_{t+1} = f(\mathbf{h}_t)\), learn \(\mathbf{h}_{t+1} = \mathbf{h}_t + f(\mathbf{h}_t, \theta_t)\)
- Similarly, RNN decoders and normalizing flows build complicated transformations by chaining sequences
- Looks like Euler discretization of continuous transformation

Neural ODE π
- \(\mathbf{h}_{t+1} = \mathbf{h}_t + f(\mathbf{h}_t, \theta_t)\)
- In the continuous limit w.r.t. depth \(t\), parameterize hidden state dynamics with an ODE specified by a neural network:
- \(\frac{d\mathbf{h}(t)}{dt} = f(\mathbf{h}(t), t, \theta)\), where neural network \(f\) has parameters \(\theta\)
- Function approximation now over a continuous hidden state dynamic

Resnet vs neural ODE π

Forward pass π
- π° Evaluate \(\mathbf{h}(t_1)\) by solving the integral
- \(\mathbf{h}(t) = \int f(t, \mathbf{h}(t), \theta_t)dt\)
- If we use Eulerβs method, we get exactly the residual state update!

Numerical ODE solver π
- Paper solves ODE initial value problems numerically with implicit Adams method
- Not unconditionally stable
- Current set of PyTorch ODE solvers only applicable to (some) non-stiff ODEs
- Without a reversible integrator (implicit Adams is not reversible), method drifts from the true solution doing a backwards integration
Advantages π
- Use existing efficient solvers to integrate neural network dynamics
- Memory cost is \(O(1)\) due to reversibility of ODE net
- Tuning ODE solver tolerance gives trade-off between accuracy and speed
- More fine-grained tuning,versus using lower precision floating point numbers

Backward pass π
- How do we train the function in the ODE?
- Output (hidden state at final depth) is used to compute loss
\begin{equation} L(\mathbf{z}(t_1)) = L\left( \mathbf{z}(t_0) + \int_{t_0}^{t_1} f(\mathbf{z}(t), t, \theta)dt \right) = L(\textrm{ODESolve}(\mathbf{z}(t_0), f, t_0, t_1, \theta)) \end{equation}
- Scalable backpropagation through ODE solver with adjoint method
- Vectorization to compute multiple derivatives in single call
How to train? π
- Backpropagate through the ODE solver layer
- Standard reverse mode chain rule equations in backprop
- Take continuous time limit of chain rule to recover the adjoint sensitivity equations

Adjoint sensitivity π

- Reverse-mode differentiation of an ODE solution
- Solving for adjoint state with same ODE solver used in forward pass
Results of ODE-Net vs Resnet π

- Fewer parameters with same accuracy
- However, harder to train, tricky to implement mini-batching
- Can’t control the training time cost, computational complexity increases

Generative latent time-series model π
- Continuous-time approach to modeling time series
- Standard VAE algorithm with
ODESolve
as decoder

- Using ODEs as a generative model allows us to make predictions for arbitrary time points \(t_1 \ldots t_M\) on a continuous timeline
Results of experiments π
- Spiral dataset at irregular time intervals, with Gaussian noise

Advantages π
- For supervised learning, its main benefit is extra flexibility in the speed/precision tradeoff.
- For time-series problems, allow handling of data at irregular intervals
Normalizing flows π
- Normalizing flows define a parametric density by iteratively transforming Gaussian sample:
\begin{align}
z_0 &\sim Normal(0, I) \\
z_1 &= f_0(z_0) \\
β¦ \\
x &= f_t(z_t)
\end{align}
- Use the change of variables formula to compute p(x): \(log p(z_{t+1}) = log p(z_t) - log | det \frac{df(z_t)}{dz_t} |\)
- Paper proposes continuous-time version, Continuous NF
- Derived continuous-time analogue of change of variables formula:

Results π

- Compare Continuous NF to NF family of models on density estimation, image generation, and variational inference
- Achieve state-of-the-art among exact likelihood methods with efficient sampling, in follow-up paper (Will Grathwohl et al., 2018)
Pros and cons π

- 4-5x slower than other methods (Glow, Real-NVP)
Summary π
- New and novel application of differential equation solvers as part of a neural net
- Idea of Resnet/RNN as an Euler discretization is a neat solution
- Much work can be done on both numerical differential equation and ML side
References π
Code: π
- https://nbviewer.jupyter.org/github/urtrial/neural%5Fode/blob/master/Neural%20ODEs.ipynb
- https://github.com/kmkolasinski/deep-learning-notes/tree/master/seminars/2019-03-Neural-Ordinary-Differential-Equations (ODE solver implementations etc.), companion to very awesome slides including adjoint method derivations
Blogposts π
- https://jontysinai.github.io/jekyll/update/2019/01/18/understanding-neural-odes.html
- https://blog.acolyer.org/2019/01/09/neural-ordinary-differential-equations/
- https://braindump.jethro.dev/posts/neural%5Fode/
- https://towardsdatascience.com/neural-odes-breakdown-of-another-deep-learning-breakthrough-3e78c7213795
- https://towardsdatascience.com/paper-summary-neural-ordinary-differential-equations-37c4e52df128
For adjoint sensitivity: π
Original content from author(s) π
Bibliography
Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D., Neural Ordinary Differential Equations, , (), (2018). β©
Grathwohl, W., Chen, R. T. Q., Bettencourt, J., Sutskever, I., & Duvenaud, D., Ffjord: free-form continuous dynamics for scalable reversible generative models, , (), (2018). β©