For a long time, "AI alignment" was a purely theoretical field, making very slow progress of questionable relevance, due to lack of anything interesting to experiment on. Now, we have things to experiment on, and the field is exploding, and we're finally learning things about how to align these systems.
But not fast enough. I really don't want to overstate the capabilities of current-generation AI systems; they're not superintelligences and have giant holes in their cognitive capabilities. But the rate at which these systems are improving is extreme. Given the size and speed of the jump from GPT-3 to GPT-3.5 to GPT-4 (and similar lower-profile jumps in lower-profile systems inside the other big AI labs), and looking at what exists in lab-prototypes that aren't scaled-out into products yet, the risk of a superintelligence taking over the world no longer looks distant and abstract.
And, that will be amazing! A superintelligent AGI can solve all of humanity's problems, eliminate poverty of all kinds, and advance medicine so far we'll be close to immortal. But that's only if we successfully get that first superintelligent system right, from an alignment perspective. If we don't get it right, that will be the end of humanity. And right now, it doesn't look like we're going to figure out how to do that in time. We need to buy time for alignment progress, and we need to do it now, before proceeding head-first into superintelligence.
A lot of people seem to take the rapid improvement of LLMs from GPT-2 through GPT-4 and their brethren, and extrapolate that trendline to infinity.
But that's not logically sound.
The advances that have allowed this aren't arbitrarily scalable. Sure, we may see some more advances in AI tech that take us a few more jumps forward—but that doesn't imply that we will keep advancing at this pace until we hit AGI/superintelligence/the singularity/whatever.
I've seen several people compare this logic to what we were seeing in the discussions about self-driving technology several years ago: some very impressive advancements had happened, and were continuing to happen, and so people extrapolated from there to assume that full self-driving capability would be coming to the market by...well, about now, actually. (I admit, I somewhat bought the hype at that time. It is possible this makes me feel more cautious now; YMMV.) I find this comparison to be persuasive, as it touches on some very similar improvements in technology. I believe that we will see ML advancements hit a similar wall fairly soon.
> A lot of people seem to take the rapid improvement of LLMs from GPT-2 through GPT-4 and their brethren, and extrapolate that trendline to infinity.
> But that's not logically sound.
Yup, five years ago I asked "Do we definitely already know it's going to be possible to deploy self-driving cars in an economically meaningful way?" and got the answer "yes", on a story titled "GM says it will put fleets of self-driving cars in cities in 2019"!
The problem is, can alignment occur before the relevant capabilities have been developed? LLMs, for example, (although very good at impersonating and talking to humans and having good world models) are particularly poor at structured reasoning and planning which are the capabilities that will actually be so dangerous. I don't believe superintelligence will be a LLM with chain-of-thought reasoning. If it's a different architecture then once again a lot of alignment work won't be relevant.
Yes, many angles on the alignment problem can be studied now, and have started making good progress recently. Some things will turn out in retrospect to not have been relevant, due to architectural shifts, but not everything. Some things are specific to LLMs; some things are specific to transformers but not to language-model transformers; some things are conceptual and likely to still apply to quite-different systems; and some things are just field-building and not specific to any architecture at all.
Eg in mechanistic interpretability, there are a lot of findings on LLMs that turn out to generalize across a wider set of NN architectures. Eg https://transformer-circuits.pub/2022/solu/index.html is something that couldn't be done without access to LLMs, but which looks likely to generalize into future architectures.
>... the risk of a superintelligence taking over the world no longer looks distant and abstract.
Can we please stop floating this as a threat? This is the more science-fiction than reality at this point and it does a great disservice to humanity. The more we keep pushing the idea that AI is the threat and not the people controlling it the less we will be focused on mitigating global risk.
It is far more likely that someone else will leverage an AI to attempt to expand their influence or dominion. Putin has essentially already stated views on this matter and we should assume groups within all adequately advanced nations will be working toward this end either independently or cooperatively.
So once again, humans are the dangerous part, clearly, if we didn't have destructive tendencies in our psyche that we're using to train these models, we wouldn't build things that would be interested in destruction.
Interesting.
I don't think we're as intelligent as we believe we are which I doubt we will ever actually build a super intelligence, we're too stupid. Even something 10x smarter than us may actually be quite "stupid".
You're neglecting to consider the power of recursion.
Maybe the best and necessary tool in aligning GPT-N is GPT-(N-1).
We've already in just the past few weeks seen the power in using models to generate instructive fine tuning data.
Don't you think aligned models might be able to be applied to better aligning future models in ways that we can't yet anticipate because capability discovery is occurring on a week by week basis and not a six month or longer basis?
But not fast enough. I really don't want to overstate the capabilities of current-generation AI systems; they're not superintelligences and have giant holes in their cognitive capabilities. But the rate at which these systems are improving is extreme. Given the size and speed of the jump from GPT-3 to GPT-3.5 to GPT-4 (and similar lower-profile jumps in lower-profile systems inside the other big AI labs), and looking at what exists in lab-prototypes that aren't scaled-out into products yet, the risk of a superintelligence taking over the world no longer looks distant and abstract.
And, that will be amazing! A superintelligent AGI can solve all of humanity's problems, eliminate poverty of all kinds, and advance medicine so far we'll be close to immortal. But that's only if we successfully get that first superintelligent system right, from an alignment perspective. If we don't get it right, that will be the end of humanity. And right now, it doesn't look like we're going to figure out how to do that in time. We need to buy time for alignment progress, and we need to do it now, before proceeding head-first into superintelligence.