Yes, most programming is a matter of solving a sequential design problem - what comes next in process of building this structure? Threaded programming are more like solving a logic or algebra problem - given certain initial conditions, what could or could not be the case?
I'm not sure whether we aren't good at threaded programming or whether threaded programming is inherently harder.
I think the model we use is wrong. My current belief is that the probability of races and deadlocks occurring is a function of the amount of shared data and that conventional shared-everything threading implementations make this far more common than it ought to be. In its extremus this points to a distribution-capable asynchronous message passing architecture á la Erlang, or at least something like Hoare's CSP.
The related criticism that there's too much mutable data is correct in many ways, but can also be seen as a way of reducing the "data frontier"; if it's immutable then there are no concurrency implications.
Now, that said, transactional models don't fit well into that mental model. There can be problems with them--write sync was mentioned in the article--but they seem to be less inherently buggy.
Not necessarily, even on a uniform memory access machine; perhaps you pass around a reference to an object that is located close to the data you want. On a non-uniform memory access machine (which is to say all of them these days when you take cache into account) message passing and the message passing formalism can actually save you time.
Anyway, I don't a priori object to shared data structures, but it is important to remember that they are inherently dangerous and the amount of shared memory should be sharply limited to that which is actually necessary, rather than blindly making everything shared.
Lock free data structures won't protect you from deadlocks or races necessarily; they mean you won't have to lock, but you can easily accidentally write a situation with two processes mutually waiting on each other.
You might even harness the power of paging in order to pass messages, with a copy-on-write exception to make the actual copying as lazy as you can. And if you're working on page boundaries copying can be done using DMA controllers.
I'm not sure whether we aren't good at threaded programming or whether threaded programming is inherently harder.