Act norm is a variant of §batch_normalization used in Glow (Diederik Kingma & Prafulla Dhariwal, 2018) that doesn’t use batch statistics \(\mathbf{\tilde{\mu}}\) and \(\mathbf{\tilde{\sigma}}\). Instead, before training begins, a batch is passed through the flow, and scale α and translation β are set such that the transformed batch has zero mean and unit variance. After this data-dependent initialization, α and β are optimized as model parameters. Act norm is preferable when training with small mini-batches since batch norm’s statistics become noisy and can destabilize training.

Bibliography

Kingma, D. P., & Dhariwal, P., Glow: Generative flow with invertible 1x1 convolutions, In , NeurIPS (pp. ) (2018). : . ↩