← All writing

Essay

Generative Models for Video Prediction

Legacy presentation notes on GANs, VAEs, and autoregressive models for video prediction.

Introduce three generative models from the view of video prediction.


1631722 鍚存€濊繙

Why generative models should be applied in video prediction?

There are uncertainties in video prediction.

Discriminative models and Generative models

Discriminative Models p(yx)\longrightarrow p(\mathbf{y} | \mathbf{x}) Models are fed with x\mathbf{x}, and supposed to produce correct y\mathbf{y} with high probability.

Generative Models p(y,x)\longrightarrow p(\mathbf{y} , \mathbf{x}) or p(x)p(\mathbf{x}) Models are supposed to model real data p(y,x)p(\mathbf{y} , \mathbf{x}) or p(x)p(\mathbf{x}) distributions.

鈥淲hat I cannot create, I do not understand.鈥?/p> 鈥擱ichard Feynman

  • Generative Adversarial Network

  • Variable Autoencoder

  • PixelRNN / PixelCNN (Autoregressive Network)

Papers:

  • Carl Vondrick, Hamed Pirsiavash, Antonio Torralba. “Generating videos with scene dynamics”, in NIPS 2016.
  • Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert. “An Uncertain Future: Forecasting from Static Images using Variational Autoencoders”, in ECCV 2016.
  • Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu. “Video Pixel Networks”, Arxiv, 2016.

GAN (conceptual)

A typical gan model consists of generator and discriminator.

Generator part of VideoGAN.

This generator can produce 32 frames at a time. Efficient!

Selected generated clips

Beach
golf
train
baby

Pros:

  • Beautiful, state-of-the-art samples!

Cons:

  • Trickier / more unstable to train.
  • Can鈥檛 solve inference queries such as p(x), p(z|x).

Lucas Theis, A盲ron van den Oord, Matthias Bethge. “A note on the evaluation of generative models”, in ICLR 2016.

Variable Autoencoders

Autoencoder Encoder of VAE (inference)

Diederik P Kingma, Max Welling. “Auto-Encoding Variational Bayes”, in ICLR 2014.

Maximize lower bound

Reparameterization tricks

zN(μ,Σ)z=μ+Lε,εN(0,I)z \sim \mathcal{N}(\mu,\,\Sigma) \longrightarrow z = \mu + L\varepsilon, \varepsilon \sim \mathcal{N}(0, I)

Predict dense trajectory

Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert. “An Uncertain Future: Forecasting from Static Images using Variational Autoencoders”, in ECCV 2016.

Results

Pros:

  • Principled approach to generative models.
  • Allows inference of q(zx)q(z|x), can be useful feature representation for other tasks.

Cons:

  • Maximizes lower bound of likelihood: okay, but not as good evaluation as PixelRNN/PixelCNN.
  • Samples blurrier and lower quality compared to state-of-the-art (GANs).

Negative log-likelihood for generative models on CIFAR-10 expressed as bits per sub-pixel.

Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma. “PIXELCNN++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications”, ICLR 2017.

PixelRNN / PixelCNN

Basic formula: P(X)=P(x1,...,xi)=P(xix1,...,xi1)P(x1,...,xi1)=...=i=1n2P(xix1,...,xi1)\begin{aligned} P(X) &= P(x_1,...,x_{i}) \\ &= P(x_i | x_1,..., x_{i-1}) P(x_1,..., x_{i-1}) \\ &= ... \\ &= {\displaystyle \prod_{i=1}^{n^2} P(x_i|x_1,...,x_{i-1})} \end{aligned}


Model every pixel iteratively with RNN.

Results in nats/frame on the Moving MNIST dataset.

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu. “Video Pixel Networks”, Arxiv, 2016.

Pros:

  • Can explicitly compute likelihood p(x).
  • Explicit likelihood of training data gives good evaluation metric.
  • Good samples.

Con:

  • Sequential generation => slow.

Thank you