> Use One Big Database. > Seriously. If you are a backend engineer, nothing is w...

makeitdouble · on Aug 2, 2022

I don't know what's the complexity of your project, but more often than not the feeling of doom coming from hitting that wall is bigger than the actual effort it takes to solve it.

People often feel they should have anticipated and avoid the scaling issues altogether, but moving from a single DB to master/replica model, and/or shards or other solutions is fairly doable, and it doesn't come with worse tradeoffs than if you sharded/split services from the start. It always feels fragile and bolt on compared to the elegance of the single DB, but you'd also have many dirty hacks to have a multi DB setup work properly.

Also, you do that from a position where you usually have money, resources and a good knowledge of your core parts, which is not true when you're still growing full speed.

bigtunacan · on Aug 3, 2022

I can't speak for cogman10, but in my experience when you start to encounter issues of hitting the limit of "one big database" you are generally dealing with some really complicated shit and refactoring to dedicated read instances, shards, and other DB hacks are just short term solutions to buy time.

The long term solutions end up being difficult to implement and can be high risk because now you have real customers (maybe not so happy because now slow db) and probably not much in house experience for dealing with such large scale data; and an absolute lack of ability to hire existing talent as the few people that really can solve for it are up to their ears in job offers.

Retric · on Aug 3, 2022

The other side of this is once you actually can’t scale a single DB the project has proved it’s value and you have a solid idea what you actually want.

Designing let alone building something scaleable on the other hand is a great way to waste extreme effort up front when it’s completely superfluous. That’s vastly more likely to actually kill a project than some growing pains especially when most projects never scale past a single reasonably optimized database.

bigtunacan · on Aug 3, 2022

You're not wrong. Probably more than 95% of applications will never outgrow one large relational database. I just think that this leads to an unfortunate, but mostly inevitable issue of complexity for the few that do hit such a level of success and scale.

Alex DeVrie (author of 'The DynamoDB Book') discusses that his approach is to essentially start all new projects with DynamoDB.

Now I don't really agree with him, yet I can't fully say he's wrong either. While we won't need it most of the time, reaching for a tool like this before we need it provides more time to really understand it when/if we reach that point.

bigtunacan · on Aug 3, 2022

@ithrow, yeah I know he is clearly biased which is why I don't really agree with him. I do however think it would have helped me to start using/learning before I needed it since the paradigm is so foreign to the relational model that is now second nature.

Aeolun · on Aug 3, 2022

DynamoDB (and Mongo) is nice, right up until you need those relations. I haven’t found a document oriented database that gives me the consistency guarantees of a RDBMS yet.

loxias · on Aug 3, 2022

Would a hybrid key-value & document store, indices on any user defined path, with optional strong consistency DWYW?

jd_mongodb · on Aug 3, 2022

You must not have looked at MongoDB. We have been delivering fully consistent ACID transactions since 4.0 which shipped several years. Yes, Jepsen did find some issues with the initial release of ACID transactions and yes, we fixed those problems pretty rapidly.

arinlen · on Aug 3, 2022

> Yes, Jepsen did find some issues with the initial release of ACID transactions (...)

By "some issues" you mean lost data and violated causal by default.

https://www.infoq.com/news/2020/05/Jepsen-MongoDB-4-2-6/

And no, the issues were not fixed. Jepsen pointed out that "the newer MongoDB 4.2.6 has more problems" including “retrocausal transactions”.

https://jepsen.io/analyses/mongodb-4.2.6

From Jepsen's report:

> Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations.

jd_mongodb · on Aug 3, 2022

Also read the addendum from the same report:

1 Updates 2020-05-26: MongoDB identified a bug in the transaction retry mechanism which they believe was responsible for the anomalies observed in this report; a patch is scheduled for 4.2.8.

arinlen · on Aug 4, 2022

> Also read the addendum from the same report:

Your initial claim was that these issues were addressed in 4.0.

Jepsen's report refutes your claim,and demonstrates MongoDB had serious reliability problems even in 4.2.6.

Frankly, your insistence in pulling the wool over everyone's eyes, specially on a topic that's easily verified, does not help built up trust on MongoDB

jd_mongodb · on Aug 4, 2022

I can see the source of confusion. Apologies. I mentioned ACID transactions were released in 4.0 but did not explicitly mention when the problems arose which of course was in 4.2 which was actually released a year later. The version numbers are clearly referenced in the Jepsen article.

redwood · on Aug 4, 2022

Another take https://arxiv.org/pdf/2111.14946.pdf

krageon · on Aug 4, 2022

This is the core culture of MongoDB - cutting corners to optimise things a little more and cater to a NoSQL crowd. It's entire mindset is fundamentally different from what you'd get in a proper relational database and ignoring those things isn't going to do any software you write any favours.

bigtunacan · on Aug 3, 2022

It's been a long time since I've used Mongo so I don't know if it only supports eventual consistency, but DynamoDB does support transactions and traditional consistency, but it comes at the cost of reduced read throughput.

DynamoDB also supports relations, but they aren't called relations because they don't resemble anything like relations in traditional relational databases.

You may already know this, but just to clarify DynamoDB isn't really a document oriented database. It's both a key/value database and a columnar database, so in that sense I'd closer to Redis and Cassandra than Mongo, but there's definitely a lot of misinformation on this front.

ithrow · on Aug 3, 2022

Devrie works AWS ;)

lelanthran · on Aug 3, 2022

> The long term solutions end up being difficult to implement and can be high risk because now you have real customers (maybe not so happy because now slow db) and probably not much in house experience for dealing with such large scale data; and an absolute lack of ability to hire existing talent as the few people that really can solve for it are up to their ears in job offers.

This is a problem of having succeeded beyond your expectations, which is a problem only unicorns have.

At that point you have all this income from having fully saturated the One Big Server (which, TBH, has unimaginably large capacity when everything is local with no network requests), so you can use that money to expand your capacity.

Any reason why the following won't work:

Step 1: Move the DB onto it's own DBOneBigServer[1]. Warn your customers of the downtime in advance. Keep the monolith as-is on the current OriginalOneBigServer.

Step 2: OriginalOneBigServer still saturated? Put copies of the monolith on separate machines behind a load-balancer.

Step 3: DBOneBigServer is still saturated, in spite of being the biggest Oxide rack there is? Okay, now go ahead and make RO instances, shards, etc. Monolith needs to connect to RO instances for RO operations, and business as usual for everything else.

Okay, so Step 3 is not as easy as you'd like, but until you get to the point that your DBOneBigServer cannot handle the loads, there's no point in spending the dev effort on sharding. Replication doesn't usually require a team of engineers f/time, like a distributed DB would.

If, after Step 3, you're still saturated, then it might be time to hire the f/time team of engineers to break up everything into microservices. While they get up to speed you're making more money than god.

Competitors who went the distributed route from day one have long since gone out of business because while they were still bugfixing in month 6, and solving operational issues for half of each workday (all at a higher salary) in month 12, and blowing their runway cash on AWS for the first 24 months, you had already deployed in month 2, spending less than they did.

I guess the TLDR is "don't architect your system as if you're gonna be a unicorn". It's the equivalent of you, personally, setting your two-year budget to include the revenue from winning a significant lottery.

You don't plan your personal life "just in case I win the lottery", so why do it with a company?

[1] backedup/failover as needed

otagekki · on Aug 3, 2022

^ This. Not so long ago, I had worked in the finance department of a $350M company as one of the five IT guys and we had just begun implementing Step 2, after OriginalOneBigServer had shown its limits. DBOneBigServer was really big though, 256 GB RAM and 128 cores if I remember correctly. So big in fact that I implemented some of my ETL tasks as stored SQL procedures to be run directly on the server. The result? A task that would easily take a big fraction of OneBigServer memory and 15 hours (expected to increase correlatedly with the revenue) is run in 30 minutes.

It's worth noting that when I left we still were nowhere close to saturate DBOneBigServer.

hnlmorg · on Aug 3, 2022

> This is a problem of having succeeded beyond your expectations, which is a problem only unicorns have.

Nope. I've worked on a few projects that are not "unicorns" yet have legitimately hit that wall. Particularly around on line gaming and gambling.

spopejoy · on Aug 4, 2022

Maybe unicorn is not the right word? If your app has millions of DAUs choking your DB, you should at least be tacking your next big investment round or some other success milestone.

Otherwise, your product is on it's way to failure, so good thing you did One Big DB...

hnlmorg · on Aug 6, 2022

You’re thinking too Silicon Valley.

These services didn’t need additional rounds of funding and aren't the kind of thing that would scale like a unicorn.

Some services might only been transient (like services based around a particular sports league or TV series) or be regional (like government sites or, also, sports leagues).

Not every service out there has aspirations to “change the world”. Some exist to fill a niche. But sometimes that “niche” still covers millions of people.

cogman10 · on Aug 2, 2022

> I don't know what's the complexity of your project, but more often than not the feeling of doom coming from hitting that wall is bigger than the actual effort it takes to solve it.

We've spent and failed at multiple multi year projects to "solve" the problem. I'm sure there are more simple problems that are easier to disentangle. But not in our case.

endisneigh · on Aug 3, 2022

Without details it's hard to really get anything from this. Could you share some?

throwfh80h82 · on Aug 3, 2022

I can share some. Had a similar experience as the parent comment. I do support "one big database" but it requires a dedicated db admin team to solve the tragedy of the commons problem.

Say you have one big database. You have 300 engineers and 30-50 product managers shipping new features every day accountable to the C-Suite. They are all writing queries to retrieve the data they want. One more join, one more N+1 query. Tons of indexes to support all the different queries, to the point where your indexes exceed the size of your tables in many cases. Database maintenance is always someone else's problem, because hey, it's one big shared database. You keep scaling up the instance size cause "hardware is cheap". Eventually you hit the m6g.16xlarge. You add read replicas. Congratulations, Now you have an eventually consistent system. You have to start figuring out which queries can hit the replica and which ones always need the fresh data. You start getting long replication lag, but it varies and you don't know why. If you decide to try to optimize a single table, you find dozens or 100+ queries that access it. You didn't write them. The engineers who did don't work here anymore....

I could go on, and all these problems are certainly solvable and could have been avoided with a little foresight, but you don't always have good engineers at a startup doing the "right thing" before you show up.

sanderjd · on Aug 3, 2022

I think this hits the nail right on the head, and it's the same criticism I have of and article itself: the framing is that you split up a database or use small vms or containers for performance reasons, but that's not the primary reason these things are useful; they are useful for people scaling first and foremost, and for technical scaling only secondarily.

The tragedy of the commons with one big shared database is real and paralyzing. Teams not having the flexibility to evolve their own schemas because they have no idea who depends on them in the giant shared schema is paralyzing. Defining service boundaries and APIs with clarity around backwards compatibility is a good solution. Sometimes this is taken too far, into services that are too small, but the service boundaries and explicit APIs are nonetheless good, mostly for people scaling.

tablespoon · on Aug 3, 2022

> Defining service boundaries and APIs with clarity around backwards compatibility is a good solution.

Can't you do that with one big database? Every application gets an account that only gives it access to what it needs. Treat database tables as APIs: if you want access to someone else's, you have to negotiate to get it, so it's known who uses what. You don't have to have one account with access to everything that everyone shares. You could

sanderjd · on Aug 3, 2022

It would be easier to create different databases to achieve the same thing. Those could be in the same database server, but clear boundaries is the key.

cerved · on Aug 3, 2022

Schemas can be useful in this regard

Wicher · on Aug 3, 2022

Indeed! And functions with security definers can be useful here too. With those one can define a very strict and narrow API that way, with functions that write or query tables that users don't have any direct access to.

Look at it as an API written in DB functions, rather than in HTTP request handlers. One can even have neat API versioning through, indeed, the schema, and give different users (or application accounts) access to different (combinations of) APIs.

The rest is "just" a matter of organizational discipline, and a matter of teams to internalize externalities so that it doesn't devolve into a tragedy of the commons — a phenomenon that occurs in many shapes, not exclusively in shared databases; we can picture how it can happen for unfettered access to cloud resources just as easily.

But here's the common difference: through the cloud, there's clear accounting per IOP, per TB, per CPU hour, so incentive to use resources efficiently is can be applied on a per-team basis — often through budgeting. "Explain to me why your team uses 100x more resources than this other team" / "Explain to me why your team's usage has increased 10-fold in three months".

Yet there's no reason to think that you can only get accounting for cloud stuff. You could have usage accounting on your shared DB. Does anyone here have experience with any kind of usage accounting system for, say, PostgreSQL?

sanderjd · on Aug 3, 2022

I think we're getting hung up on database server vs. database as conceptual entity. I think separation between the entities is more important (organizationally) and don't think it matters as much whether or not the server is shared.

andreimackenzie · on Aug 3, 2022

These are real problems, but there can also be mitigations, particularly when it comes to people scaling. In many orgs, engineering teams are divided by feature mandate, and management calls it good-enough. In the beginning, the teams are empowered and feel productive by their focused mandates - it feels good to focus on your own work and largely ignore other teams. Before long, the Tragedy of the Commons effect develops.

I've had better success when feature-focused teams have tech-domain-focused "guilds" overlaid. Guilds aren't teams per-se, but they provide a level of coordination, and more importantly, permanency to communication among technical stakeholders. Teams don't make important decisions within their own bubble, and everything notable is written down. It's important for management to be bought in and value participation in these non-team activities when it comes to career advancement (not just pushing features).

In the end, you pick your poison, but I have certainly felt more empowered and productive in an org where there was effective collaboration on a smaller set of shared applications than the typical application soup that develops with full team ownership.

krmboya · on Aug 3, 2022

In uni we learnt about federated databases, i.e multiple autonomous, distributed, possibly heterogeneous databases joined together by some middleware to service user queries. I wonder how that would work for this situation, in the place of one single large database.

Federated databases are never usually mentioned in these kind of discussions involving 'web scale'. Maybe because of latency? I don't know

Aeolun · on Aug 3, 2022

> Teams not having the flexibility to evolve their own schemas because they have no idea who depends on them

This sounds like a problem of testing and organization to me, not a problem with single big databases.

sanderjd · on Aug 3, 2022

Sure. My point is that the organization problems are more difficult and interesting than the technical problems being discussed in the article and in most of the threads.

krageon · on Aug 4, 2022

Introducing an enormous amount of overhead because training your software engineers to use acceptable amounts of resources instead of just accidentally crashing a node and not caring is a little ridiculous.

makeitdouble · on Aug 3, 2022

For whatever reason I've been thrown into a lot of companies at that exact moment when "hardware is cheap" and "not my problem" approaches couldn't cut it anymore...

So yes, it's super painful, and requires a lot of change in processes, mindsets, and it's hard to get everyone to understand things will get slower from there.

On the other end, micro-services and/or multi-DB is also super hard to get right. One of the surprise I had was all the "cache" that each services started silently adding on their little island when they realized the performance penalty they had from fetching data from half a dozen services on the more complicated operations. Or the same way DB abuse from one group could slow down everyone, and service abuse on the core parts (e.g. the "user" service) would impact most of the other services. More that a step forward, it felt a lot like a step sideways and continuing to do the same stuff, just in a different way.

My take from it was that teams that are good at split architectures are also usually good at monolith, and vice-versa. I feel from the parent who got stuck in the transition.

chii · on Aug 3, 2022

> teams that are good at split architectures are also usually good at monolith, and vice-versa.

aka, low-competency engineers will not outperform using better processes or project management.

The way, imho, is to up-skill the team (which is only possible if it was small unfortunately).

ozb · on Aug 3, 2022

Sure, you'll get to m6g.16xlarge; but how many companies actually have oltp requirements that exceed the limits of single servers on AWS, eg u-12tb1.112xlarge or u-24tb1.metal (that's 12-24tb memory)? I think these days the issues with high availability, cost/autoscaling/commitment, "tragedy of the commons", bureaucracy, and inter-team boundaries are much more likely to be the drawback than lack of raw power.

khalilmohammad · on Aug 3, 2022

You do not need that many database developers, it's a myth. Facebook has 2 dedicated database engineers managing it. I work in United Nations, there is only 1 dedicated database developer in 1000+ team.

If you have a well designed database system. You do not need that many database engineers.

endisneigh · on Aug 3, 2022

I do not disagree at all that what you are describing can happen. What I'm not understanding is why they're failing at multi year attempts to fix this.

Even in your scenario you could identify schemas and tables that can be separated and moved into a different database or at maturity into a more scalable NoSQL variety.

Generally once you get to the point that is being described that means you have a very strong sense on the of queries you are making. Once you have that it's not strictly necessary to even use a RDBMS, or at the very least, a single database server.

lmm · on Aug 3, 2022

> Even in your scenario you could identify schemas and tables that can be separated and moved into a different database or at maturity into a more scalable NoSQL variety.

How? There's nothing tracking or reporting that (unless database management instrumentation has improved a lot recently), SQL queries aren't versioned or typechecked. Usually what happens is you move a table out and it seems fine, and then at the end of the month it turns out the billing job script was joining on that table and now your invoices aren't getting sent out.

> Generally once you get to the point that is being described that means you have a very strong sense on the of queries you are making.

No, just the opposite; you have zillions of queries being run from all over the case and no idea what they all are, because you've taught everyone that everything's in this one big database and they can just query for whatever it is they need.

lazide · on Aug 3, 2022

Every database I know of can generate query logs. Why not just log every query and do some statistical analysis on it?

lmm · on Aug 3, 2022

> Every database I know of can generate query logs.

Every one I know of warns that it comes with significant performance implications and isn't intended to be used in production.

> Why not just log every query and do some statistical analysis on it?

And then what? If you know this table only gets queried a few times a month, what does that actually tell you that you can use?

lazide · on Aug 3, 2022

It's resource intensive - but so is being in a giant tarpit/morass. Adding client query logging is cheaper and can be distributed. I just double checked, and neither Oracle nor Postgres warn 'never use it in production'

And if you have logs, you can see what actually gets queried, and by whom, and what doesn't get queried, and by whom.

That will also potentially let you start constructing views and moving actual underlying tables out of the way to where you can control them.

Which can let you untangle the giant spaghetti mess you're in.

But then, that's just me having actually done that a few times. You're welcome to complain about how it's actually unsolvable and will never get better, of course.

lmm · on Aug 3, 2022

> It's resource intensive - but so is being in a giant tarpit/morass.

Agreed, but it means it's not really a viable option for digging yourself out of that hole if you're already in it. Most of the time if you're desperately trying to split up your database it's because you're already hitting performance issues.

> Adding client query logging is cheaper and can be distributed.

Right, but that only works if you've got a good handle on what all your clients are. If you've got a random critical script that you don't know about, client logging isn't going to catch that one's queries.

> But then, that's just me having actually done that a few times. You're welcome to complain about how it's actually unsolvable and will never get better, of course.

I've done it a few times too, it's always been a shitshow. Query logging is a useful tool to have in some cases but it's often not an option, and even when it is not a quick or easy fix. You're far better off not getting into that situation in the first place, by enforcing proper datastore ownership and scalable data models from the start, or at least from well before you start hitting the performance limits of your datastores.

GeorgeDewar · on Aug 3, 2022

If you are in the hole where you really cannot add load to your database server but want to log the queries, there is a technique called zero impact monitoring where you literally mirror the network traffic going to your database server, and use a separate server to reconstruct it into query logs. These logs identify the queries that are being run, and critically, who/what is running them.

A past workplace of mine has used this approach.

lazide · on Aug 3, 2022

Excellent pointer! Do you happen to know of anything which can do this out of the box?

banku_brougham · on Aug 3, 2022

I've seen this too. I guess 50% of query load were jobs that got deprecated in the next quarterly baseline.

It felt a system was needed to allocate query resource to teams, some kind of tradeable tokens that were scarce maybe, to incentivise more care and consciousness of the resource from the many users.

What we did was have a few levels of priority managed by a central org. It resulted in a lot of churn and hectares of indiscriminately killed query jobs every week, many that had business importance mixed in with the zombies.

pythonaut_16 · on Aug 4, 2022

Let me tell you about my new Blockchain CorpoCoin... (/s)

branko_d · on Aug 3, 2022

Do you think it would make it better to have the tables hidden behind an API of views and stored procedures? Perhaps a small team of engineers maintaining that API would be be able to communicate effectively enough to avoid this "tragedy of commons" and balance the performance (and security!) needs of various clients?

dalyons · on Aug 3, 2022

This is so painfully painfully true. I’ve seen in born out personally at three different companies so far. Premature splitting up is bad too, but I think the “just use one Postgres for everything” crowd really underestimate how bad it gets in practice at scale

Aeolun · on Aug 3, 2022

Maybe it’s all a matter of perspective? I’ve seen the ‘split things everywhere’ thing go wrong a lot more times than the ‘one big database’ thing. So I prefer the latter, but I imagine that may be different for other people.

Ultimately I think it’s mostly up to the quality of the team, not the technical choice.

dalyons · on Aug 3, 2022

I’ve seen splitting things go bad too. But less often and to a lesser degree of pain than mono dbs - a bad split is much easier to undo than monodb spaghetti.

However I think it’s “thou shall” rules like this blog post that force useless arguments. The reality is it depends, and you should be using your judgement, use the simplest thing (monodb) until it doesn’t work for you, then pursue splitting (or whatever). Just be aware of your problem domain, your likely max scale, and design for splitting the db sooner than you think before you’re stuck in mud.

And if you’re building something new in an already-at-scale company you should perhaps be starting with something like dynamo if it fits your usecase.

cogman10 · on Aug 3, 2022

I did in the original comment.

We have over 200 monolith applications each accessing overlapping schemas of data with their own sets of stored procedures, views, and direct queries. To migrate a portion of that data out into it's own database requires, generally, refactoring a large subset of the 200 monolith apps to no longer get all the data in one query, but rather a portion of the data with the query and the rest of the data with a new service.

Sharding the data is equally difficult because even tracing who is writing the data is spread from one side of the system to the next. We've tried to do that trough an elaborate system of views, but as you can imagine, those are too slow and cover too much data for some critical applications so they end up breaking the shard. That, in and of itself, introduces additional complexity with the evolution of the products.

Couple that with the fact that even with these solutions, getting a large portion of the organization is not on board with these solutions (why can't we JUST buy more hardware? Get JUST bigger databases?) and these efforts end up being sabotaged from the beginning because not everyone thinks it's a good idea (And if you think you are different, I suggest just looking at the rest of the comments here in HN that provide 20 different solutions to the problem some of which are "why can't you just buy more hardware?")

But, to add to all of this, we also just have organizational deficiencies that have really harmed these efforts. Including things like a bunch of random scripts checked into who knows where that are apparently mission critical and reading/writing across the entire database. General for things like "the application isn't doing the right thing, so this cron job run every Wednesday will go in and fix things up" Quiet literally 1000s of those scripts have been written.

This isn't to say we've been 100% unsuccessful at splitting some of the data into it's own server. But, it's a long and hard slog.

banku_brougham · on Aug 3, 2022

>Including things like a bunch of random scripts checked into who knows where that are apparently mission critical and reading/writing across the entire database.

This hits pretty hard right now, after reading this whole discussion.

When there is a galaxy with countless star systems of data its good to have locality owners of data who publish for their usage as domain leaders, and build a system that makes subscription and access grants frictionless.

cogman10 · on Aug 3, 2022

100% agreed and that's what I've been trying to promote within the company. It's simply hard to get the momentum up to really affect this change. Nobody likes the idea that things have to get a little slower (because you add a new layer between the data) before they can get faster.

gfody · on Aug 3, 2022

fwiw hacking hundreds of apps literally making them worse by fragmenting their source of record doesn't sound like a good plan. it's no surprise you have saboteurs, your company probably wants to survive and your plan is to shatter its brain.

outside view: you should be trying to debottleneck your sql server if that's the plan the whole org can get behind. when they all want you to succeed you'll find a way.

cogman10 · on Aug 3, 2022

> fwiw hacking hundreds of apps literally making them worse by fragmenting their source of record doesn't sound like a good plan. it's no surprise you have saboteurs, your company probably wants to survive and your plan is to shatter its brain.

The brain is already shattered. This wouldn't "literally make them worse", instead it would say that "now instead of everyone in the world hitting the users table directly and adding or removing data from that table, we have one service in charge of managing users".

Far too often we have queries like

SELECT b.*, u.username FROM Bar b JOIN users u ON b.userId = u.id

And why is this query doing that? To get a human readable username that isn't needed but at one point years ago made it nicer to debug the application.

> you should be trying to debottleneck your sql server if that's the plan the whole org can get behind.

Did you read my post? We absolutely HAVE been working, for years now, at "debottlenecking our sql server". We have a fairly large team of DBAs (about 30) who's whole job is "debottlenecking our sql server". What I'm saying is that we are, and have been, at the edge (and more often than not over the edge) of tipping over. We CAN'T buy our way out of this with new hardware because we already have the best available hardware. We already have read only replicas. We already have tried (and failed at) sharding the data.

The problem is data doesn't have stewards. As a result, we've spent years developing application code where nobody got in the way of saying "Maybe you shouldn't join these two domains together? Maybe there's another way to do this?"

gfody · on Aug 4, 2022

assuming some beastly server with terabytes of ram, hundreds of fast cores, and an exotic io subsystem capable of ridiculous amounts of low latency iops, I'd guess the perf issue with that example is not sql server struggling with load but rather lock contention from the users table being heavily updated. unless that beast of a server is sitting pegged with a hardware bottleneck it can probably be debottlenecked by vertically partitioning the users table. ie: split the table into two (or more) to isolate the columns that change frequently from the ones that don't, replace the table with a view that joins it back together w/instead-of triggers conditionally updating the appropriate tables, etc. etc. then when this happens:

    SELECT b.*, u.username FROM Bar b JOIN users u ON b.userId = u.id

sql server sees that you're only selecting username from the users view and eliminates the joins for the more contentious tables and breathes easy peasy

> And why is this query doing that? To get a human readable username that isn't needed but at one point years ago made it nicer to debug the application.

imo users should be able to do this and whatever else they want and it's not even unreasonable to want usernames for debugging purposes forever. I'd expect the db team to support the requirements of the apps teams and wouldn't want to have to get data from different sources

cogman10 · on Aug 4, 2022

> assuming some beastly server with terabytes of ram, hundreds of fast cores, and an exotic io subsystem capable of ridiculous amounts of low latency iops, I'd guess the perf issue with that example is not sql server struggling with load but rather lock contention from the users table being heavily updated.

You'd guess wrong. The example above is not the only query our server runs. It's an example of some of the queries that can be run. We have a VERY complex relationship graph, far more than what you'll typically find. This is finance, after all.

I used the user example for something relatable without getting into the weeds of the domain.

We are particularly read heavy and write light. The issue is quiet literally that we have too many applications doing too many reads. We are literally running into problems where our tempDb can't keep up with the requests because there are too many of them doing too complex of work.

You are assuming we can just partition a table here or there and everything will just work swimmingly, that's simply not the case. Our tables do not so easily partition. (perhaps our users table would, but again, that was for illustrative purposes and by no means the most complex example).

Do you think that such a simple solution hasn't been explored by a team of 50 DBAs? Or that this sort of obvious problem wouldn't have been immediately fixed?

gfody · on Aug 4, 2022

> Do you think that such a simple solution hasn't been explored by a team of 50 DBAs? Or that this sort of obvious problem wouldn't have been immediately fixed?

based on what you've shared, yeah. I also wouldn't expect a million DBAs to replace a single DBE

SixDouble5321 · on Aug 3, 2022

Just trust me, I can refute anything with vague explanations.

jeffrallen · on Aug 3, 2022

One nice compromise is to migrate to using read-only database connections for read tasks from the moment you upgrade from medium sized DB hardware to big hardware. Keep talking to the one big DB with both connections.

Then when you are looking at the cost of upgrading from big DB hardware to huge DB hardware, you've got another option available to compare cost-wise: a RW main instance and one more read-only replicas, where your monolith talks to both: read/write to the master and read-only to the replicas via a load balancer.

withinboredom · on Aug 3, 2022

I don’t see anyone mentioning memcached which can reduce the load on the db tremendously.

bigiain · on Aug 3, 2022

I've basically been building CRUD backends for websites and later apps since about 1996.

I've fortunately/unfortunately never yet been involved in a project that we couldn't comfortably host using one big write master and a handful of read slaves.

Maybe one day a project I'm involved with will approach "FAANG scale" where that stops working, but you can 100% run 10s of millions of dollars a month in revenue with that setup, at least in a bunch of typical web/app business models.

Early on I did hit the "OMG, we're cooking our database" where we needed to add read cacheing. When I first did that memcached was still written in Perl. So that joined my toolbox very early on (sometime in the late 90s).

Once read cacheing started to not keep up, it was easy enough to make the read cache/memcached layer understand and distribute reads across read slaves. I remember talking to Monty Widenius at The Open Source Conference, I think in Sad Jose around 2001 or so, about getting MySQL replication to use SSL so I could safely replicate to read slaves in Sydney and London from our write master in PAIX.

I have twice committed the sin of premature optimisation and sharded databases "because this one was _for sure_ going to get too big for our usual database setup". It only ever brought unneeded grief and never actually proved necessary.

kedean · on Aug 2, 2022

Many databases can be distributed horizontally if you put in the extra work, would that not solve the problems you're describing? MariaDB supports at least two forms of replication (one master/replica and one multi-master), for example, and if you're willing to shell out for a MaxScale license it's a breeze to load balance it and have automatic failover.

danudey · on Aug 2, 2022

I worked at a mobile game company for years and years, and our #1 biggest scaling concern was DB write throughput. We used Percona's MySQL fork/patch/whatever, we tuned as best we could, but when it comes down to it, gaming is a write-heavy application rather than the read-heavy applications I'm used to from ecommerce etc.

Sharding things out and replicating worked for us, but only because we were microservices-y and we were able to split our schemas up between different services. Still, there was one service that required the most disk space, the most write throughput, the most everything.

(IIRC it was the 'property' service, which recorded everything anyone owned in our games and was updated every time someone gained, lost, or used any item, building, ally, etc).

We did have two read replicas and the service didn't do reads from the primary so that it could focus on writes, but it was still a heavy load that was only solved by adding hardware, improving disks, adding RAM, and so on.

hot_gril · on Aug 2, 2022

Not without big compromises and a lot of extra work. If you want a truly horizontally scaling database, and not just multi-master for the purpose of availability, a good example solution is Spanner. You have to lay your data out differently, you're very restricted in what kinds of queries you can make, etc.

hot_gril · on Aug 3, 2022

Clarification, you can make unoptimized queries on Spanner with a great degree of freedom when you're doing offline analysis, but even then it's easy to hit something that's too slow to work at all, whereas in Postgres I know it'd not be a problem.

kbenson · on Aug 2, 2022

For what it's worth, I think distributing horizontally is also much easier if you're already limited your database to specific concerns by splitting it up in different ways. Sharding a very large database with lots of data deeply linked sounds like much more of a pain than something with a limited scope that isn't too deeply linked with data because it's already in other databases.

To some degree, sharding brings in a lot of the same complexities as different microservices with their own data store, in that you sometimes have to query across multiple sources and combine in the client.

cogman10 · on Aug 2, 2022

We've done that (MSSQL Always on) and that's what keeps the lights on today. It's not, however, something that'll remain sustainable.

katbyte · on Aug 3, 2022

Shouldn't your company have started to split things out and plan for hitting the limit of hardware a couple box sizes back? I feel there is a happy middle ground between "spend months making everything a service for our 10 users" and "welp i looks like we cant upsize the DB anymore, guess we should split things off now?"

rsync · on Aug 3, 2022

Isn’t this easily solved with sharding ?

That is, one huge table keyed by (for instance) alphabet and when the load gets too big you split it into a-m and n-z tables, each on either their own disk or their own machine.

Then just keep splitting it like that. All of your application logic stays the same … everything stays very flat and simple … you just point different queries to different shards.

I like this because the shards can evolve from their own disk IO to their own machines… and later you can reassemble them if you acquire faster hardware, etc.

Aeolun · on Aug 3, 2022

> Once you get to that point, it becomes SUPER hard to start splitting things out.

Maybe, but if you split it from the start you die by a thousand cuts, and likely pay the cost up front, even if you’d never get to the volumes that’d require a split.

moonchrome · on Aug 4, 2022

>Once you get to that point, it becomes SUPER hard to start splitting things out. All the sudden you have 10000 "just a one off" queries against several domains that are broken by trying carve out a domain into a single owner.

But that's survivorship bias and looking back at things from current problems perspective.

You know what's the least future proof and scalable project ? The one that gets canceled because they failed to deliver any value in reasonable time in the early phase. Once you get to "huge project status" you can afford glacial pace. Most of the time you can't afford that early on - so even if by some miracle you knew what scaling issues you're going to have long term and invested in fixing them early on - it's rarely been a good tradeoff in my experience.

I've seen more projects fail because they tangle themselves up in unnecessary complexity early on and fail to execute on core value proposition, than I've seen fail from being unable to manage the tech debt 10 years in. Developers like to complain about the second, but they get fired on the first kind. Unfortunately in todays job market they just resume pad their failures as "relevant experience" and move on to the next project - so there is not correcting feedback.

n0ts0gr33n · on Aug 2, 2022

I'd be curious to know what your company does which generates this volume of data (if you can disclose), what database you are using and how you are planning to solve this issue.

cogman10 · on Aug 2, 2022

Finance. MSSQL.

There are multiple plans on how to fix this problem but they all end up boiling down to carving out domains and their owners and trying to pull apart the data from the database.

What's been keeping the lights on is "Always On" and read only replicas. New projects aren't adding load to the db and it's simply been a slow going getting stuff split apart.

What we've tried (and failed at) is sharding the data. The main issue we have is a bunch of systems reading directly from the db for common records rather than hitting other services. That means any change in structure requires a bunch of system wide updates.

winrid · on Aug 4, 2022

You can get a machine with multiple terabytes of ram and hundreds of CPU cores easily. If you can afford that, you can afford a live replica to switch to during maintenance.

FastComments runs on one big DB in each region, with a hot backup... no issues yet.

Before you go to microservices you can also shard, as others have mentioned.

Flow · on Aug 2, 2022

Do you have Spectre countermeasures active in the kernel of that machine?

runjake · on Aug 2, 2022

What does it matter, in this context?

If it's about bare metal vs. virtual machines, know that Spectre affects virtual machines, too.

rkagerer · on Aug 2, 2022

I think they are implying disabling them (if on) could squeeze you out a bit more performance.

runjake · on Aug 3, 2022

Ah, I did not think of that angle, which is substantial. Thanks.

egorfine · on Aug 4, 2022

Substantial as in "up to 30% of performance".

conradev · on Aug 3, 2022

Why can’t the databases handle the load? That is to say, did you see this coming from a while away or was it a surprise?

pottertheotter · on Aug 3, 2022

Wonder if this would be a good use case for Tidalscale?