Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe it's similar to Google's deep dreaming created images, but with videos.


This technique is quite different from deep dream. An autoencoder is a model that is trained to compress then reconstruct an image. After seeing and trying to reconstruct many images, it learns how to make a "compressed representation" from original images by going through this compress -> uncompress scheme, which typically results in blurry/lossy reconstructions of images which are very different than the normal artifacts we see in lossy compression schemes like jpg, etc.

In deep dream, a network whose goal is to predict image categories is "run in reverse" and used as a generative model, but it was not intended to be generative from the outset. Many times autoencoders are intended to be generative, or at least learn relationships between images (sometimes called a latent space/latent variable), where classification tasks care more about just getting the classification correct - no explicit concern on learning relationships between categories/images in those groups.


Would it be possible to make it look better or even the same by increasing digits per frame (eg 1000 instead of 200)?


Yes, although the particular technique used here (variational autoencoder, or VAE) doesn't benefit from increasing the code space due to a particular penalty (KL divergence against a fixed N(0, 1) Gaussian prior) during training. So even by making it go to 1000 digits in the code space, the model will probably still only choose to use 5 or 10 dimensions unless the KL penalty is relaxed, making it closer to a standard autoencoder.

An autoencoder with a much larger code space could be an improvement, or some newer works such as Adversarially Learned Inference / Adversarial Feature Learning [0, 1] or real NVP [2, 3] could probably do a much better job at the task, at the cost of increased computation.

Also something like inpainting parts of the frames with pixel RNN [4] would be interesting.

[0] http://arxiv.org/abs/1606.00704

[1] http://arxiv.org/abs/1605.09782

[2] http://www-etud.iro.umontreal.ca/~dinhlaur/real_nvp_visual/

[3] http://arxiv.org/abs/1605.08803

[4] http://arxiv.org/abs/1601.06759


Given that Adversarially Learned Inference came out yesterday, the best approach may be to just wait a couple of months and see what the state of the art is then.


Sure - that is always an option especially in deep learning right now. But this current crop of models (counting in DCGAN and LAPGAN/Eyescream) has really made a leap in my eyes from before "oh cool generative model" to "are these thumbnails real?". They are really generating a lot of cohesive global structure, which is pretty awesome!


Yes, technically yes. Although I'd say this is not what this experiment aims to.

In the future we may be able to colouring b/w movies and increase fps via this automatic software.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: