Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not inevitable, it's essentially impossible.

There are a few things that can cause tremendously widespread outages, essentially all of them network configuration changes. Actually deleting customer data is dramatically more difficult to the point of impossible - there are so many different services in so many different locations with so many layers of access control. There is no "one command" that can do such a thing - at the scale of a worldwide network of data centers there is no "rm -rf /".



Ah, but you fail to account for Google's incredible knack for building tools designed to do things at scale. Or put AI in things that don't need it.

The possibility Google will either manage to unleash a malicious AI on their infrastructure and/or develop a way to destroy a lot of data at scale quite efficiently or some combination of the two is far from zero.

Bear in mind, this "Little Oops" should also have been impossible: https://www.techspot.com/news/103207-google-reveals-how-blan...


.....no?

"We deployed this private cloud with a missing parameter and it wasn't caught" is as different from "we wiped out all customer data" as hello world is from Kubernetes.

No one promised this "should be impossible". Did you confuse "we'll take steps to ensure this never happens again"?


It's pretty much half the puzzle actually.

You contend there's no global rm rf for a global cloud provider, but clearly a missing parameter can rm rf a customer in an irrecoverable manner.

The only half you're missing is... how every major cloud outage happens today... a bad configuration update. These companies have hundreds of thousands of servers, but they also use orchestration tools to distribute sets of changes to all of them.

You only need a command to rm rf one box, if you are distributing that command to every box.

Now sure, there are tons of security precautions and checks and such to prevent this! But pretending it's impossible is delusional. People do stupid stuff, at scale, every day.

The most likely scenario is a zero day in an environment necessitating an extremely rapid global rollout, combined with a plain, simple error.


And the most telling thing about most of these outages is that the provider later admits in their postmortem that they just didn't really understand how the system they made worked until it fell over and were forced to learn how it really works.

It's the sort of thing that used to keep me up at night.


When was the last time it wasn't a cascading failure caused by Rube Goldberg levels of interdependency on their own systems.


The release process, monitoring checks, etc. for a customer's private cloud is generally significantly different from the release process for a global product. I'm not going to get any more specific for all the standard NDA reasons, but having worked for Google and Microsoft among others....no, the risk you describe doesn't translate from one to the other.


Do you not remember crowdstrike?


Again: an outage caused by a config change is different from data loss.

The remediation was painful but it was not data loss.


What if a machine was supposed to be running to capture data?


Yet.


I understand you believe the checks cannot fail that catastrophically, and I agree that the likelihood they do is quite low.

But it can happen, and it only has to happen once. (Also FYI, telling me your work history just tells me you've drunk the koolaid, ain't proof you know more.)


Delete a decryption key. Good luck! I'll see you at the end of time.

Break your control plane, and you can't stop the propagation of poison.

Propagate the wrong trust bundle... everywhere.

Also, it's not about the delete command. It's about the automatic cleanup following behind it that shreds everything, or repurposes the storage.


Children of the kubernetic line.


Cyclic infrastructure dependencies suck :(


Google accidentally deleted customer location history data from customer devices (after intentionally deleting it from Google servers) just last year.

If didn't back it up yourself, it is gone forever.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: