Link to Google Doc
In the previous post I talked about scaling, progress in AI, and the possible paths to AGI. Today, I will lay the specific paths aside and instead focus on the final product—I will discuss my views on the timeline for AGI as well as what happens after. These views are a significant motivating force in my decision to pivot my career towards alignment.
Timeline
The rate of progress in AI research is hard to measure. But just looking at the capabilities gains since 2012, when AlexNet was released, until Gato and the current LMs, it seems like the progress is scarily fast. The models of today regularly beat the average human and even experts in certain domains, such as Go and Atari.
More compute and better architectures, such as the Transformer, make it easier to create large models—which gain capabilities, as shown by Kaplan et al. 2020. At the same time, the number of machine learning and AI papers uploaded to ArXiv has exploded. In 2018, 35,900 papers were submitted to ArXiv and so far this year that has already been surpassed and it seems like the total number of papers submitted this year will surpass 70,000. The number of researcher-labor-hours spent on AI research therefore seems to have increased by a lot. What drives much of my timelines is the expectation of innovations that discontinuously lead to AI capability gains. Those are very hard to forecast and therefore my timeline has a lot of uncertainty.
I predict AGI within 4-30 years with 50% probability. My median estimate of when AGI is created is 2034.
How does this timeline compare to other public forecasts?
Metaculus currently has a median estimate of a general AI by 2039. Note though that this question has very specific and strong resolution criteria. Metaculus median estimate for a weak general AI is 2028, with less stringent resolution criteria. The difference between these two questions is an overestimation in my opinion. Short feedback loops and a strong incentive to create stronger and more general agents seems likely to lead to surprise. This is the main difference between my forecast and the Metaculus estimate—I believe weakly general AI will be created in the next 4-7 years and then that developing general AI will not take more than 2-6 years.
Eliezer Yudkowsky has taken a lighthearted bet with Bryan Caplan about the world ending by misaligned AI before 2030. Elon Musk has informally tweeted that he would be surprised if we do not have AGI by 2029 and Ray Kurzweil has predicted that AI will equal human intelligence in 2029. Thus it seems that I’m somewhere on the early side compared to the forecasting community—the difference seems to come from me believing that there will be less time from weak general systems to stronger ones—and that I’m later compared to many prominent thinkers.
Metaculus also has an open question on the time between the first AGI and superintelligence that currently has a median estimate of about 2 years. The operationalization of the question is somewhat flawed and I would not draw any strong conclusions from it. But it suggests that Metaculus in general expects moderate progress even after AGI is created. My personal timeline from AGI to superintelligence, using the resolution criteria from the general AI system question above, would have a median that is between 1-5 months. A system with those capabilities would very likely be capable enough in programming, math, and optimization to create even stronger systems—it would also have the incentive to create agents that are more efficient at creating worlds that score high on the creator’s objective function.
P(doom|AGI)?
Finally, to perhaps the most important question. What is p(doom|AGI)? How likely is it that humanity is doomed given that we develop AGI? There are many reasons to be quite fearful here. Spelling out all the arguments that inform my views here would take up a lot of space, and it is something that I’m planning to do over a longer timeframe, but I will provide some guiding resources below.
- Corrigibility, one of the most common objections to AI being dangerous is that we can “just turn it off”. Corrigibility is a way harder problem than that. It would certainly be helpful but it seems unlikely that we will be able to create systems that would allow us to shut it off without creating incentives for it to shut itself off.
- Meditating on the “strawberry problem” of creating an AI system that is capable enough to create a molecular replica of a strawberry and then stopping, has also been important in shaping my thinking about alignment.
- Thinking about optimization and agents that can guide the world towards outcomes they want is hard, but “The Hidden Complexity of Wishes” makes it easier to understand how hard it is to create a good objective function.
- I also think more recent discussions of the ways AGI could be dangerous is well worth reading. Eliezer’s original post and Paul Christiano’s response, for two differing views on the dangers of AGI.
- Gwern’s “It Looks Like You’re Trying To Take Over The World” about an AI taking over the world, is helpful in imagining how scaling and neural nets may lead to a hard takeoff.
P(doom|AGI) is uncomfortably high. I believe that there’s a ~30% risk that humanity becomes significantly disempowered by AGI systems and a ~10% risk of AGI systems leading to human extinction. At the same time, I believe that the odds of survival become significantly better with more alignment efforts and more time. Creating AGI at the end of this century has a much lower p(doom) than creating an AGI 10 years from now.
We have lots of work to do to bring that risk down!