This is possibly the next generation of video/audio/image codecs. What he did wa...

DannyBee · on June 3, 2016

"with compressors that can are so good for one specific set of data that compressor + data is smaller than anything the traditional compressors could create."

This is well, hopeful but probably wrong ... because we already know how to make it smaller for this type of data.

The algorithms used for interframe/intraframe prediction are chosen to tradeoff speed vs size. If you built a really large-scale predictor that was able to generate very small representations for changes, you would get what he's built.

(Note that encoders already complexly select from tons of different prediction algorithms for each set of frames, etc)

We can already do this if someone wanted to. We just don't.

Because it's not fast enough (and nothing in that work changes this)

Would it be useful to apply NN's to video encoders to better select among prediction modes, etc. Probably. But that's already being done, and is not this guy's work.

"And even if its not fast enough, it can still be a very efficient compressor, trading size for CPU. "

The problem you have is not just compression time. It's decompression time. The bitstream of H265/etc is meant to be decodable fast.

What this guy is building is not. If you were to make it so, it would probably look closer to a normal video codec bitstream, and take up that much space.

In fact, he hasn't built anything truly new, he's just using existing papers and making an implementation. He also says, in his masters thesis, that is primarily an artistic exploration.

Even with hardware decoding, you can only make stuff so fast.

TL;DR While interesting, there are people working on the things you are talking about, and it's not this guy (at least in this work).

I would not expect magic here. We already create video codec algorithms by trading off cpu cost and size. The trick is trying to get better size without increasing CPU cost significantly. As these resources change (and remember, moore's law is pretty much dead), the video codecs will change, and videos will get smaller, but you aren't likely to see serious breakthroughs. We already could produce very small videos by applying tremendous amounts of CPU power.

Houshalter · on June 3, 2016

It wouldn't be decoded on the CPU, but on the GPU. Or even specialized hardware. As convnets are being used more in image processing, that isn't too unrealistic. There is interest in making specialized consumer hardware for convnets.

And it doesn't need to work on every frame, it could pump out I-frames every 15 seconds or so.

radarsat1 · on June 3, 2016

Except that in my understand, the decoded version looks like shit. So, while optimistic that this technology might be quite good eventually, more heuristics are probably needed for the right way to extract an optimal encoding.

e.g. see the screenshots here:

http://www.eteknix.com/blade-runner-gets-trippy-auto-encoded...

lallysingh · on June 3, 2016

As part of a compression suite, that may not matter all that much. I can think of a hybridized version of this that starts with this ANN and then applies 'fixes' by a traditional video codec stream. If the ANN can represent most of the bits of the frame pretty well, the ANN + Codec data can match a traditional codec-only approach with substantially less data.

joe_the_user · on June 3, 2016

The thing is that diff between the film and "encoded" version look pretty fuzzy/muddy and so compressing that diff would have to be difficult.

Anyway, the article doesn't mention this idea, the first hn discussion had someone claim some other ridiculous use of the fuzzy/muddy images that are supposed to mean something and ultimately I'd say it's an example of "my research project has cool images, I can use them for publicity".

Mithaldu · on June 3, 2016

This is only a proof of concept, with a very small input size, and very strong compression. The impressive thing is that it works at all.

radarsat1 · on June 3, 2016

Right, but my point is that your numbers don't indicate anything about the actual potential as a compression format. It is certainly possible to use it that way, but there is still much that is uncertain about whether a neural network can do better than hand-crafted compression formats. So it's a little early to call it "next generation." It is literally meaningless to point out that "his neural network compresses each image down to 200 floats", since it doesn't actually work well with such a small latent space.

It's not exactly unknown that autoencoders can reconstruct images, so I don't see why it's "impressive that it works at all."

Mithaldu · on June 3, 2016

> So it's a little early to call it "next generation."

Please don't ignore words other people write, or at least reread before hitting post. I did say possibly.

> I don't see why it's "impressive that it works at all."

I thought it was implicit from my explanation that the "works at all" includes the fact that the data sizes are reasonable already. If we had the same result, but the intermediary data was 18 gigabytes, then there would be nothing impressive about it. As it is, we're a lot below that, before further compression, so it is.

Remember: Prototype, Proof of Concept. You're looking at the first step, not the last step. You're looking at a motor carriage, not a Tesla.

radarsat1 · on June 3, 2016

> I thought it was implicit from my explanation that the "works at all" includes the fact that the data sizes are reasonable already.

Right, but why is that impressive if it doesn't actually result in a good reconstruction? I can take any collection of numbers and summarize it with the mean value, that doesn't imply that averaging is a good compression method.

If you consider this proof of concept, then what, exactly, concept does it prove? That statistics can represent a dataset?

Houshalter · on June 3, 2016

You would store the diff between the reconstructed version and the original. The diff could be compressed to much less space than the original image, because the NN can predict most of the pixels. And get within a few bits of the rest.

tripzilch · on June 3, 2016

This network was trained on a single NVidia 960 GPU for two weeks, I wonder how much better it could have been with more computing power thrown at it?

bane · on June 3, 2016

It sounds to me, almost superficially, that this technique might benefit from work done in compressive sensing. The mechanism of encoding and decoding seem very related.

k__ · on June 3, 2016

Is it like a virtual, self modifying, FPGA for compression?