Could a bayesian approach work for this kind of thing? You treat each question as a separate test, and use it to update your believe in the hypothesis that the student is an A student. Do this for all questions, to get a final degree of believe that the student is an A student.
Do similar for the hypothesis that the student is a B student, and so on for C, D, and F. Then give them the grade that ends up with the highest degree of belief.
Yes, I didn't say anything about how to assign your initial degree of belief.
You've got the right idea--there's lots of recent literature treating the typical IRT problem as a Bayesian inference problem. Some of the references listed at the bottom of the article (especially the Kim one) provide great literature surveys.
The beauty of item response theory is that you can infer both the test's parameters and the student's abilities. You can use EM or Gibbs sampling to estimate those parameters simultaneously, so there's no need to limit yourself to test difficulty or student ability.
In your described scenario, who decides each question's difficulty? Are we sure the teacher or test author understand the true difficulty of their questions? This kind of grading scheme is marginally better than some super-naive approach, as we can't be sure the professor fully understands each question's difficulty.
Yeah, the article seems to start at a weird point. An algorithm to find out how many points a given question is worth might be better than having the teacher just decide what is worth how much. Or it might be worth.
But what kind of stupid school system would grade math per question instead of per difficulty?
Do similar for the hypothesis that the student is a B student, and so on for C, D, and F. Then give them the grade that ends up with the highest degree of belief.
Yes, I didn't say anything about how to assign your initial degree of belief.