I believe it's similar to Google's deep dreaming created images, but with videos...

kastnerkyle · on June 3, 2016

This technique is quite different from deep dream. An autoencoder is a model that is trained to compress then reconstruct an image. After seeing and trying to reconstruct many images, it learns how to make a "compressed representation" from original images by going through this compress -> uncompress scheme, which typically results in blurry/lossy reconstructions of images which are very different than the normal artifacts we see in lossy compression schemes like jpg, etc.

In deep dream, a network whose goal is to predict image categories is "run in reverse" and used as a generative model, but it was not intended to be generative from the outset. Many times autoencoders are intended to be generative, or at least learn relationships between images (sometimes called a latent space/latent variable), where classification tasks care more about just getting the classification correct - no explicit concern on learning relationships between categories/images in those groups.

manmal · on June 3, 2016

Would it be possible to make it look better or even the same by increasing digits per frame (eg 1000 instead of 200)?

kastnerkyle · on June 3, 2016

Yes, although the particular technique used here (variational autoencoder, or VAE) doesn't benefit from increasing the code space due to a particular penalty (KL divergence against a fixed N(0, 1) Gaussian prior) during training. So even by making it go to 1000 digits in the code space, the model will probably still only choose to use 5 or 10 dimensions unless the KL penalty is relaxed, making it closer to a standard autoencoder.

An autoencoder with a much larger code space could be an improvement, or some newer works such as Adversarially Learned Inference / Adversarial Feature Learning [0, 1] or real NVP [2, 3] could probably do a much better job at the task, at the cost of increased computation.

Also something like inpainting parts of the frames with pixel RNN [4] would be interesting.

[0] http://arxiv.org/abs/1606.00704

[1] http://arxiv.org/abs/1605.09782

[2] http://www-etud.iro.umontreal.ca/~dinhlaur/real_nvp_visual/

[3] http://arxiv.org/abs/1605.08803

[4] http://arxiv.org/abs/1601.06759

vintermann · on June 3, 2016

Given that Adversarially Learned Inference came out yesterday, the best approach may be to just wait a couple of months and see what the state of the art is then.

kastnerkyle · on June 3, 2016

Sure - that is always an option especially in deep learning right now. But this current crop of models (counting in DCGAN and LAPGAN/Eyescream) has really made a leap in my eyes from before "oh cool generative model" to "are these thumbnails real?". They are really generating a lot of cohesive global structure, which is pretty awesome!

snickmy · on June 3, 2016

Yes, technically yes. Although I'd say this is not what this experiment aims to.

In the future we may be able to colouring b/w movies and increase fps via this automatic software.

kastnerkyle · on June 3, 2016

The future is now! [0,1]

[0] http://richzhang.github.io/colorization/

[1] https://www.youtube.com/watch?v=cizgVZ8rjKA