I found the original email equally disappointing, though. It boils down to "We pushed the envelope on size, it's too slow, we'd like to speed it up." Well, duh.
He uses the word 'scalability' early in the email, but shows no indication that he knows what it means. I'd love to hear if different operations slow down at different rates as the repo accumulates commits. Do they scale linearly, sublinearly, or superlinearly as the repo grows? Are there step functions at which there's a sudden dramatic slowdown (ran out of RAM, etc.)?
It's intentionally vague but with enough details that if you're actually in a position to help, you'll recognize what's going on and actually directly contact to get more information.
You don't spill internal processes and configurations without some kind of disclosure agreements and certainly not in a public forum.
There's no need to spill internal processes and configurations. The fellow said he had a synthetic repo that he used to benchmark various operations. Surely whatever generated that test repo can scale it up or down to whatever size they like, so you can benchmark at various points and collect the data that would tell us if there is some horrible non-linear scaling going on under the covers.
Right now it sounds like he's just trying to see what the possible solutions for his issues are. If he can provide additional benchmarks, etc., great. But he's under no obligation to provide any more than he has. Once there's a solution, then maybe.
Of course he's not under any obligation to provide any more info than he has. But given that he already has the test harness setup, and that only he has access to the hardware on which his benchmarks ran, it seems that he could easily enable more people to help him by providing additional data points.
I'm not asking for secrets here. I'm asking for some sign that he has a well-defined problem to solve.
How git performs as repo size grows to 15GB isn't hidden in a vault at facebook somewhere; I suspect they just haven't done anything more detailed than a superficial time measurement.
And as much as I'd like being truly open as an ideal, it falls apart when you're dealing with competition (not cooperation) and money. At best you try to keep things open enough.
I don't see how Facebook's build needs to be kept secret. It's a purely internal process and while they might lose something by giving details they can also gain if someone suggests improvements. That said, there are plenty of tings they need to keep secret. EX: Letting anyone export FB's full social graph would be really stupid.
I found the original email equally disappointing, though. It boils down to "We pushed the envelope on size, it's too slow, we'd like to speed it up." Well, duh.
He uses the word 'scalability' early in the email, but shows no indication that he knows what it means. I'd love to hear if different operations slow down at different rates as the repo accumulates commits. Do they scale linearly, sublinearly, or superlinearly as the repo grows? Are there step functions at which there's a sudden dramatic slowdown (ran out of RAM, etc.)?