Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We use Postgres partitioning quite successfully to handle our customer based data. Everything is partitioned by customer and some customers are partitioned further.

One gotcha to be careful with is that if you run a query spanning multiple partitions, it will run them all at once and if your database isn't super big - will bring it to its knees.

Outside of that really no issues. We also use Timescale quite heavily, which also works fantastic.



IIRC, it won't run them all at the _same time_, it will run them in parallel, which ends up being limited by the value of `max_parallel_workers` in your config. So if you find that parallel operations are swamping your DB, maybe trying stepping that config down a bit. In general, queries that span several partitions shouldn't be a problem. Hitting _all_ partitions for a really big table, that's another thing entirely. But since you're partitioned you really ought to be limiting by the value of the partition key in your queries anyways...


> But since you're partitioned you really ought to be limiting by the value of the partition key in your queries anyways...

Not if these are analytical queries...

If you're wondering what the point of partitioning a table that's going to be used for analytical queries: if you partition by insertion time, and your table is append-only, then you're only ever writing to one partition (the newest one), with all the partitions other than the newest one being "closed" — VACUUM FREEZEd and then never written to again. So, rather than one huge set of indices that get gradually-more expensive to update as you insert more records, those write costs will only rise up to the end of the month/week/day, and then jump back down to zero as you begin writing to a new empty partition.


Do you have any notes on how you set this up? Did you use extensions and if so which?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: