For a long time, "AI alignment" was a purely theoretical field, making very slow...

danaris · on March 29, 2023

A lot of people seem to take the rapid improvement of LLMs from GPT-2 through GPT-4 and their brethren, and extrapolate that trendline to infinity.

But that's not logically sound.

The advances that have allowed this aren't arbitrarily scalable. Sure, we may see some more advances in AI tech that take us a few more jumps forward—but that doesn't imply that we will keep advancing at this pace until we hit AGI/superintelligence/the singularity/whatever.

I've seen several people compare this logic to what we were seeing in the discussions about self-driving technology several years ago: some very impressive advancements had happened, and were continuing to happen, and so people extrapolated from there to assume that full self-driving capability would be coming to the market by...well, about now, actually. (I admit, I somewhat bought the hype at that time. It is possible this makes me feel more cautious now; YMMV.) I find this comparison to be persuasive, as it touches on some very similar improvements in technology. I believe that we will see ML advancements hit a similar wall fairly soon.

tome · on March 29, 2023

> A lot of people seem to take the rapid improvement of LLMs from GPT-2 through GPT-4 and their brethren, and extrapolate that trendline to infinity.

> But that's not logically sound.

Yup, five years ago I asked "Do we definitely already know it's going to be possible to deploy self-driving cars in an economically meaningful way?" and got the answer "yes", on a story titled "GM says it will put fleets of self-driving cars in cities in 2019"!

https://news.ycombinator.com/item?id=15824953

I just have no idea how people are making the extrapolations they are making about the power of future large language models.

versteegen · on March 29, 2023

The problem is, can alignment occur before the relevant capabilities have been developed? LLMs, for example, (although very good at impersonating and talking to humans and having good world models) are particularly poor at structured reasoning and planning which are the capabilities that will actually be so dangerous. I don't believe superintelligence will be a LLM with chain-of-thought reasoning. If it's a different architecture then once again a lot of alignment work won't be relevant.

jimrandomh · on March 29, 2023

Yes, many angles on the alignment problem can be studied now, and have started making good progress recently. Some things will turn out in retrospect to not have been relevant, due to architectural shifts, but not everything. Some things are specific to LLMs; some things are specific to transformers but not to language-model transformers; some things are conceptual and likely to still apply to quite-different systems; and some things are just field-building and not specific to any architecture at all.

Eg in mechanistic interpretability, there are a lot of findings on LLMs that turn out to generalize across a wider set of NN architectures. Eg https://transformer-circuits.pub/2022/solu/index.html is something that couldn't be done without access to LLMs, but which looks likely to generalize into future architectures.

HybridCurve · on March 29, 2023

>... the risk of a superintelligence taking over the world no longer looks distant and abstract.

Can we please stop floating this as a threat? This is the more science-fiction than reality at this point and it does a great disservice to humanity. The more we keep pushing the idea that AI is the threat and not the people controlling it the less we will be focused on mitigating global risk.

It is far more likely that someone else will leverage an AI to attempt to expand their influence or dominion. Putin has essentially already stated views on this matter and we should assume groups within all adequately advanced nations will be working toward this end either independently or cooperatively.

We are more than likely in an arms race now.

ChatGTP · on March 29, 2023

So once again, humans are the dangerous part, clearly, if we didn't have destructive tendencies in our psyche that we're using to train these models, we wouldn't build things that would be interested in destruction.

Interesting.

I don't think we're as intelligent as we believe we are which I doubt we will ever actually build a super intelligence, we're too stupid. Even something 10x smarter than us may actually be quite "stupid".

kromem · on March 29, 2023

You're neglecting to consider the power of recursion.

Maybe the best and necessary tool in aligning GPT-N is GPT-(N-1).

We've already in just the past few weeks seen the power in using models to generate instructive fine tuning data.

Don't you think aligned models might be able to be applied to better aligning future models in ways that we can't yet anticipate because capability discovery is occurring on a week by week basis and not a six month or longer basis?

zshrdlu · on April 3, 2023

> eliminate poverty of all kinds

There was never a technical hurdle to this problem where a super-intelligent AI is what we need.

famouswaffles · on March 29, 2023

i don't if alignment has any where to go. any more than you can align people for instance. the idea is just kind of ridiculous in the first place.