> Even more interesting, no part of the image spectrum was primarily responsible either. We could get rid of all the high-frequency information, and the AI could still recognise race in fairly blurry (non-diagnostic) images. Similarly, and I think this might be the most amazing figure I have ever seen, we could get rid of the low-frequency information to the point that a human can’t even tell the image is still an x-ray, and the model can still predict racial identity just as well as with the original image!
Im not totally ignorant of what the models are supposed to be doing, about as much as "the boat should float" makes me a qualified sailor... This doesn't sound like the boat is floating the right way up?
This is a very surprising result, putting it mildly!
Edit: Upon reflection, the performance as a function of image degradation is not that surprising given what we know about the sensitivity of neural networks to slight perturbations.
My best guess at this point is that while humans can detect other features like breast density/bone density/BMI from the scan, they don't automatically interpret race as a function of these, while of course the NN does. The fact that it does so much better than the direct regression between these features and race (e.g. .55 AUC predicting race from BMI, .54 AUC predicting race from breast density) is initially very surprising. But they don't report the results of a similar experiment using all of these together. I suspect that simply predicting race from BMI + Breast Density + Age + Bone Density + Sex would achieve similar performance.
Im not totally ignorant of what the models are supposed to be doing, about as much as "the boat should float" makes me a qualified sailor... This doesn't sound like the boat is floating the right way up?