Well, one thought that comes to mind is that the topic is essentially a subset of ergonomics, so perhaps experimental protocols should take a few more cues from ergonomics research.
For example, controlling variables has to be done by constructing artificial systems from the ground up. You can't just pull two commercial products off the shelf and then pretend you're examining the impact of only one of the hundered different things that differs between the two.
Similarly, if we wanted to compare the impact of dynamic vs. static typing, we'd have to make sure that that is the only variable. Which means you basically have to construct a new programming language from the ground up, so that you can easily create new dialects of it that differ in only one very specific characteristic.
For example, controlling variables has to be done by constructing artificial systems from the ground up. You can't just pull two commercial products off the shelf and then pretend you're examining the impact of only one of the hundered different things that differs between the two.
Similarly, if we wanted to compare the impact of dynamic vs. static typing, we'd have to make sure that that is the only variable. Which means you basically have to construct a new programming language from the ground up, so that you can easily create new dialects of it that differ in only one very specific characteristic.