> Each Tuesday I have been looking at the performance results of all the PRs merged in the past week
I really appreciate the work described here, but it seems to me that the performance tests are in exactly the wrong place. The correct time to benchmark a PR is before it is merged; not after. It's the same for any other kind of test; your tests should pass before you merge, not after.
Sure, there will be exceptions where a performance regression is appropriate. But that should be known and agreed on before merging it.
I don't see a strong reason that the performance tests have to be done afterwards. I presume that it's okay to have a delay between creating the pull request and its merge, since humans will typically want to be able to review it!
> It's the same for any other kind of test; your tests should pass before you merge, not after
Rust is very good about testing before merge, not after. It somewhat pioneered doing complete testing before merge (at least, built significant tooling and mind-share around doing it): https://graydon2.dreamwidth.org/1597.htmlhttps://bors.tech/
That is to say, if Rust isn't validating something pre-merge, it's more likely than not that there's a good reason for doing so. I don't personally know the reason for Rust, but, for performance testing in general, it's easy to overwhelm infrastructure because benchmarks should be being run on a single physical machine that's otherwise quiet. This means that testing is completely serialised and there's no opportunity for parallelism. Based on https://perf.rust-lang.org/status.html , it looks like the performance tests currently take a bit longer than an hour each.
As others have mentioned, there's automatic tooling for running tests on PRs that may be performance sensitive, and PRs are reverted if they accidentally cause a regression, to be able to reset and reanalyse.
I think the heuristic is really easy: "Is it slower than before?"
If it's slower, then you require a manual override to accept the merge.
That will make performance regressions a whole lot less likely. Sure, in some cases they may be necessary, but in many cases they were unintentional; they can be fixed, and then merged.
The linked llvm article about performance ( https://nikic.github.io/2020/05/10/Make-LLVM-fast-again.html ) says:
"Rust does this by running a set of benchmarks on every merge, with the data available on perf.rust-lang.org. Additionally, it is possible to run benchmarks against pull requests using the @rust-timer bot. This helps evaluate changes that are intended to improve compile-time performance, or are suspected of having non-trivial compile-time cost."
So, they do seem to have a bot to perform such tests, though I have no clue how often they use it.
In practice I do not think this will work. There is already a queue of PRs waiting to be tested and merged.
Most changes are not going to change the compiler performance. Better to watch a trend line and revert any change that hurts performance in a significant way. Releases are every 6 weeks so there is plenty of time for this type of analysis and activity.
I really appreciate the work described here, but it seems to me that the performance tests are in exactly the wrong place. The correct time to benchmark a PR is before it is merged; not after. It's the same for any other kind of test; your tests should pass before you merge, not after.
Sure, there will be exceptions where a performance regression is appropriate. But that should be known and agreed on before merging it.
I don't see a strong reason that the performance tests have to be done afterwards. I presume that it's okay to have a delay between creating the pull request and its merge, since humans will typically want to be able to review it!