I happen to know of several companies doing physics problems that scale poorly across cores that spend far north of that, usually building out small clusters. Then you run 100s of independent simulations since each individual one doesn't really scale.
They can, what I'm saying is that a single application doesn't scale well over multiple cores. Multiple instances on a single cpu generally works fine, but the biggest impact on performance is per core speed.
Edit: I was really just responding to "who spends $15,000 on a mid-high end server to run single threaded applications anyway?". I would absolutely consider this a "single threaded application".
Fluid flow and most particle simulations with a large number of particles. The limiting factor is the inter particle interactions, so all the calculations have to feed back into each other.
Both of those problems are well worn and can scale to as many cores as we can put in a single computer.
Whether it is a navier-stokes grid/image fluid simulation, arbitrary points in space that work off of nearest neighbors or a combination of both (by rasterizing into a grid and using that to move the particles), there are many straightforward ways to use lots of CPUs.
Fork join parallelism is a start. Sorting particles into a kd-tree is done by recursively partitioning and the partitions can be distributed amount cores. The sorting structure can be read but not written by as many cores as you want, and thus their neighbors can be searched and found by all cores at once.
If you spawn 100 independent instances, it's not really the problem itself scaling. The point is that given a single set of operating conditions you won't see any meaningful gains going from 2 to 100 cores. Using idle resources for other simulations doesn't make the problem itself scale.