Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you're able to, would you mind linking to some resources about the case you describe in point three where the kernel need deal with an invalid stack pointer? I'm very curious about the underlying causes.

Also, regarding point two, is the cumulative result of those factors you describe (other NMIs being blocked, lack of kernel control, the unreliable heuristic) that a diagnostic NMI may end up never being run?



> If you're able to, would you mind linking to some resources about the case you describe in point three where the kernel need deal with an invalid stack pointer?

Look up SYSCALL in the manual (AMD APM or Intel SDM). SYSCALL switches to kernel mode without changing RSP at all. This means that at least the first few instructions of the kernel’s SYSCALL entry point have a completely bogus RSP. An NMI, MCE, or #DB hitting there is fun. For the latter, see CVE-2018-8897. You can also read the actual NMI entry asm in Linux’s arch/x86/entry/entry_64.S for how the kernel handles an NMI while trying to return from an NMI :)

To some extent, the x86_64 architecture is a pile of kludges all on top of each other. Somehow it all mostly works.

> is the cumulative result of those factors you describe (other NMIs being blocked, lack of kernel control, the unreliable heuristic) that a diagnostic NMI may end up never being run?

No, that’s unrelated. A diagnostic NMI causes the kernel’s NMI vector to be invoked, and that’s all. Suppose that this happens concurrently with a perf NMI. There could be no indication whatsoever that the diagnostic NMI happened: the two NMIs can get coalesced, and, as far as the kernel can tell, only the perf NMI happened.

Once all the weird architectural junk is out of the way, the NMI handler boils down to:

    for each possible NMI cause
      did it happen?  if so, handle it.
    
    If no cause was found, complain.
Amazon’s thing is trying to hit the “complain” part. What they should do is give some readable indication that it happened so it can be added to the list of possible causes.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: