This technique is quite different from deep dream. An autoencoder is a model that is trained to compress then reconstruct an image. After seeing and trying to reconstruct many images, it learns how to make a "compressed representation" from original images by going through this compress -> uncompress scheme, which typically results in blurry/lossy reconstructions of images which are very different than the normal artifacts we see in lossy compression schemes like jpg, etc.
In deep dream, a network whose goal is to predict image categories is "run in reverse" and used as a generative model, but it was not intended to be generative from the outset. Many times autoencoders are intended to be generative, or at least learn relationships between images (sometimes called a latent space/latent variable), where classification tasks care more about just getting the classification correct - no explicit concern on learning relationships between categories/images in those groups.
Yes, although the particular technique used here (variational autoencoder, or VAE) doesn't benefit from increasing the code space due to a particular penalty (KL divergence against a fixed N(0, 1) Gaussian prior) during training. So even by making it go to 1000 digits in the code space, the model will probably still only choose to use 5 or 10 dimensions unless the KL penalty is relaxed, making it closer to a standard autoencoder.
An autoencoder with a much larger code space could be an improvement, or some newer works such as Adversarially Learned Inference / Adversarial Feature Learning [0, 1] or real NVP [2, 3] could probably do a much better job at the task, at the cost of increased computation.
Also something like inpainting parts of the frames with pixel RNN [4] would be interesting.
Given that Adversarially Learned Inference came out yesterday, the best approach may be to just wait a couple of months and see what the state of the art is then.
Sure - that is always an option especially in deep learning right now. But this current crop of models (counting in DCGAN and LAPGAN/Eyescream) has really made a leap in my eyes from before "oh cool generative model" to "are these thumbnails real?". They are really generating a lot of cohesive global structure, which is pretty awesome!