There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime.
That isn't true. The whole point it to quickly pick up statistically significant variations quickly, and with the volume of tests they are doing there is plenty of data.
If you turn on the 95% CI bands you can see there is plenty of statistical significance.
[1]: https://marginlab.ai/trackers/claude-code/