Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It was super awesome when we were able to delete a huge chunk of the GCS Connector for Hadoop

It's been a few years, but I lost customer data on gs:// due to listing consistency - create a bunch of files, list dir, rename them one by one to commit the dir into Hive - listing missed a file & the rename skipped the missing one.

Put a breakpoint and the bug disappears, hit the rename and listing from the same JVM, the bug disappears.

Consistency issues are hell.

> It was super awesome when we were able to delete a huge chunk of the GCS Connector for Hadoop, and I hope to see the same across the S3-focused connectors.

Yes!

S3guard was a complete pain to secure, since it was built on top of dynamodb and since any client could be a writer for some file (any file), you had to give all clients write access to the entire dynamodb table, which was quite a security headache.



If it's been a few years, it was likely before the migration to Spanner:

https://cloud.google.com/blog/products/gcp/how-google-cloud-...

We posted that blog in early 2018, but stated:

> Last year we migrated all of Cloud Storage metadata to Spanner, Google’s globally distributed and strongly consistent relational database.

So maybe pre-2017?


Had no idea this happened. Very cool to see. We use both GCS and S3.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: