Generative Models for Video Prediction

Introduce three generative models from the view of video prediction.

1631722 鍚存€濊繙

Why generative models should be applied in video prediction?

There are uncertainties in video prediction.

Discriminative models and Generative models

Discriminative Models $\longrightarrow p(\mathbf{y} | \mathbf{x})$ Models are fed with $\mathbf{x}$ , and supposed to produce correct $\mathbf{y}$ with high probability.

Generative Models $\longrightarrow p(\mathbf{y} , \mathbf{x})$ or $p(\mathbf{x})$ Models are supposed to model real data $p(\mathbf{y} , \mathbf{x})$ or $p(\mathbf{x})$ distributions.

鈥淲hat I cannot create, I do not understand.鈥?/p> 鈥擱ichard Feynman

Generative Adversarial Network
Variable Autoencoder
PixelRNN / PixelCNN (Autoregressive Network)

Papers:

Carl Vondrick, Hamed Pirsiavash, Antonio Torralba. “Generating videos with scene dynamics”, in NIPS 2016.
Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert. “An Uncertain Future: Forecasting from Static Images using Variational Autoencoders”, in ECCV 2016.
Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu. “Video Pixel Networks”, Arxiv, 2016.

GAN (conceptual)

A typical gan model consists of generator and discriminator.

Generator part of VideoGAN.

This generator can produce 32 frames at a time. Efficient!

Selected generated clips

Beach
golf
train
baby

Pros:

Beautiful, state-of-the-art samples!

Cons:

Trickier / more unstable to train.
Can鈥檛 solve inference queries such as p(x), p(z|x).

Lucas Theis, A盲ron van den Oord, Matthias Bethge. “A note on the evaluation of generative models”, in ICLR 2016.

Variable Autoencoders

Autoencoder	Encoder of VAE (inference)

Diederik P Kingma, Max Welling. “Auto-Encoding Variational Bayes”, in ICLR 2014.

Maximize lower bound

Reparameterization tricks

$z \sim \mathcal{N}(\mu,\,\Sigma) \longrightarrow z = \mu + L\varepsilon, \varepsilon \sim \mathcal{N}(0, I)$

Predict dense trajectory

Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert. “An Uncertain Future: Forecasting from Static Images using Variational Autoencoders”, in ECCV 2016.

Results

Pros:

Principled approach to generative models.
Allows inference of $q(z|x)$ , can be useful feature representation for other tasks.

Cons:

Maximizes lower bound of likelihood: okay, but not as good evaluation as PixelRNN/PixelCNN.
Samples blurrier and lower quality compared to state-of-the-art (GANs).

Negative log-likelihood for generative models on CIFAR-10 expressed as bits per sub-pixel.

Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma. “PIXELCNN++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications”, ICLR 2017.

PixelRNN / PixelCNN

Basic formula: $\begin{aligned} P(X) &= P(x_1,...,x_{i}) \\ &= P(x_i | x_1,..., x_{i-1}) P(x_1,..., x_{i-1}) \\ &= ... \\ &= {\displaystyle \prod_{i=1}^{n^2} P(x_i|x_1,...,x_{i-1})} \end{aligned}$

Model every pixel iteratively with RNN.

Results in nats/frame on the Moving MNIST dataset.

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu. “Video Pixel Networks”, Arxiv, 2016.

Generative Models for Video Prediction

Introduce three generative models from the view of video prediction.

Why generative models should be applied in video prediction?

Discriminative models and Generative models

Generative Adversarial Network

Variable Autoencoder