you can ignore that ai 2027 forecast
how to evaluate predictions that include exponential intelligence improvements
the form of the argument
you’re probably familiar with this style of argument if you follow the public conversation on ai progress:
artificial intelligence will get better and better at coding and research, eventually reaching parity with the best humans
because of (1), ai will continuously reduce the human effort involved in the engineering and scientific discovery that goes into improving ai systems
because of (1) and (2), ai systems will eventually get the ability to reliably improve their own performance
because of (3), the capabilities of ai systems will increase exponentially until they become “generally super-intelligent”
this conclusion is called “takeoff” so arguments for it are sometimes called “takeoff arguments.” after establishing the takeoff conclusion, prognostications about social peril typically ensue. readers are then urged to (i) mitigate the risks the forecaster has foreseen, and (ii) only erect obstacles necessary to reduce those risks, but not ones that would delay post-superintelligence utopia.
disagreement over these arguments usually center on “timelines” because it’s considered unsophisticated to reject any of the premises outright. since i’m generally down for unsophistication, i’ll argue that we should reject the first premise — not by asserting its negation (that ai will never reach parity with the best humans at coding and research), but by insisting that we haven’t been given enough reason to believe they will. absent those reasons, we can safely dismiss the claims as uninteresting, and under-motivated given the obstacles considered herein.
the problem with takeoff premises
to address one concern right off the bat, ai already does match or exceed expert-level performance in parts of coding and research (namely, producing research summaries and code snippets given natural language prompting). but the premise is only true if ai reaches parity with the best humans (say, top quartile in an ai lab) at all the other activity beyond just writing the code and summarizing research. these include activities like:
assessing whether the code or research are any good — or made up!
identifying gaps in the research, like unexplored assumptions or under-appreciated methods
discussing goals of the research and coding with peers who may disagree
assessing coherence between the goals of the researcher and goals of the research
understanding the network of complex tasks that are compatible with, and conducive to, achievement of those goals
understanding what “success” looks like for general intelligence and imagining the different ways this could be approximated, modularized, simulated, etc.
understand what humans generally want and how certain product experiences might be untenable (e.g. ones that take too long, or are too costly, or insult the boss).
understanding normative relations between bits of information and methods (e.g. confidentiality, legality, originality, etc.)
evaluating tradeoffs between different research avenues while attending to all the above
this is not to say that with enough focused prompting we couldn’t find examples of human-level performance at each of these tasks. the point is just that the human researcher’s ability to do all of them reliably and often effortlessly is what sets her apart from the performance of our best models today.
multiple recent papers have argued that even the best reasoning models cannot reliably solve complex bounded tasks like the tower of hanoi — even when given the algorithm that enables a consistent solution. nor can they solve complex math and coding tasks that are freshly generated and not contained in the training data. worse, they still confidently assert that they have solved the problems when they haven’t. these limitations in bounded domains like math, coding, and algorithmic games are nothing compared to the complex, open-ended, nested social activities involved in the work of engineering and research. while ai can certainly can help people with these tasks today, we do not have line of sight to a future in which they could do them autonomously — at or beyond human parity. and that’s what’s needed for takeoff arguments to get going.
notice that if this is a problem for the first premise, it would be just as much a problem for the 2nd. the work of engineering and scientific discovery aimed at improving ai systems will face the same kinds of problems — and more, if they require some intuition about the way the physical world works. so without either of these premises, there’s no compelling case for self-improvement. and with no case for self-improvement, the case for takeoff collapses.
the ai 2027 argument
i’ll show my cards up front. i’m not an ai skeptic. i use it everyday: at work and for personal stuff. it’s magical, and i love it. i also work for company that builds large language models (disclaimer: views my own).
so — with those disclaimers out of the way — let’s evaluate the ai 2027 paper against the criticism made above. in the paper, the authors argues that we’ll have ai systems that are “super-human coders” by 2027 and that within a year or two they’ll self-improve to become a network of artificial superintelligences who take over the world — either destroying humanity or ushering in a new golden age of scientific progress and space colonization.
it’s sensational, whacky stuff to be sure. and yet the authors succeeded in capturing national media attention for a few weeks. none of them were famous outside the silicon valley bubble, but they ended up going on a bunch of podcasts, including two(!) new york times podcasts. to their credit though, the paper is readable (at least, somewhat) , rigorous (at least, mathematically), and imaginative (at least, for techies). and they proactively invited objections. so here’s mine.
first, their argument that we will reach superhuman coders by 2027 is that if a lab were to focus on building models that were especially good at improving ai research, they would succeed because of an “explicit focus” on this an “extensive codebase” that can serve as “high-quality training data.”
that’s it. truly. that’s the argument. they don’t give us any story about what it means for a model to be “optimized” for ai research such that this differentiation makes sense and is sufficiently different from what we already have in the state of the art.
to be sure, there’s a clear paradigm of fine-tuning a model for better performance in one domain if you have enough high-quality training data in that area. but what would be the high-quality training data in this case that was sufficiently large to enable successful fine-tuning and sufficiently different from existing data sources that are already incorporated into the state of the art models? the “extensive” codebase would have to be very large and very differentiated. we’d need a compelling answer to that question in order to see models that were differentiated in this way. particularly because the strategy is so obvious that it’s hard to believe it hasn’t already been tried and found wanting for precisely these reasons. anyone who has ever dreamed of working on ai systems has dreamed of using them to recursively accelerate that work.
sadly, the rest of the argument just doubles down on that assumption — that ai which is better at enabling ai research will beget better ai which will eventually beget the super-human coder imagined below. they sprinkle in some improvements from synthetic data and long-horizon task completion data that the company pays humans to generate, but these don’t alter the fundamental structure of the problem (if anything, they introduce their own problems).
beyond these undermotivated suppositions, we aren’t given reasons to think there’s a pathway towards developing ai that is meaningfully optimized to enable more effective ai r&d such that it would recursively reduce cycle time between improvements. their argument more or less takes this achievement for granted.
Agent-1 had been optimized for AI R&D tasks, hoping to initiate an intelligence explosion.44 OpenBrain doubles down on this strategy with Agent-2. It is qualitatively almost as good as the top human experts at research engineering (designing and implementing experiments), and as good as the 25th percentile OpenBrain scientist at “research taste” (deciding what to study next, what experiments to run, or having inklings of potential new paradigms).45 While the latest Agent-1 could double the pace of OpenBrain’s algorithmic progress, Agent-2 can now triple it, and will improve further with time.
while they’re happy to concede that the next iteration only quadruples their rate of progress because of bottlenecks that require humans, they basically have what they need at this point. the jump to super-human ai researchers is pretty trivial. i agree when they say:
We think that most possible barriers that would block SAR from being feasible in 2027 would also block SC from being feasible in 2027.
but we haven’t been given sufficient reason to believe superhuman coders are feasible by 2027. absent those reasons, we can safely ignore the rest of the story about space colonization and human extinction.
where we’re left
when we come back to reality, what we actually find is that — far from automating the jobs of software engineers — most of the labs are quietly flailing. here’s how you can tell:
first, the product pivot. most of the labs are trying to transition to “productize” their models. if they truly had line of sight to achieving agi, they wouldn’t be doing this. instead, they’d be racing to the finish line. they could reasonably assure their investors behind the scenes that their energies were best spent not on revenue or business models, but on “solving agi, then letting agi solve the rest.” the push to put ai on “devices” is just this kind of giveaway.
second, the data bottlenecks. most ai labs have converged around the same levels of performance in the performance of foundation models, where gains are now eeked out through post-training reasoning models. there’s open talk among insiders that all the available high-quality data has been used up and progress just on scaling up has hit a wall. according to other insiders the big labs hiring doctors and scientists to just label data — not to train models or do science, but just to create more high quality data for them to use in training. scrambles to sign contracts with publishers for more high quality training data, etc. synthetic data generally doesn’t help, indeed it can ruin a model. low-quality data from social media often doesn’t help either.
third, the diminishing returns of reasoning models. one of my favorite bits of apple news coming out of their recent annual conference was a paper called “the illusion of thinking” from their ai researchers about how dramatically so-called “reasoning” models fail at basic algorithmic tasks — even when explicitly given the algorithm. that they should be allowed to publish this at all right before the big conference where the world was expecting apple to make major ai announcements just goes to show how skeptical the company surely is about the relative consumer value of these products.
other papers have claimed to show that state of the art reasoning models actually hallucinate more than prior models. remember, in 2023 sam altman confidently predicted that we wouldn’t be talking about hallucination problems in 2025. but if anything, we’re talking about it more — which means more unreliability.
finally, observe the general hesitancy about future launches from the big players:
if openai had something worth calling gpt-5, they would have released it already. when they finally do release it, it will have to be paired with some other exciting announcements because the product itself will be too little, too late. similar story for other major labs.
if apple thought that gen-ai integrations in siri were genuinely worthwhile, they would have released them already. instead, their high quality bar has made them hold back. they’re much more willing to wait, and not afraid that they’re going to miss any critical boats here.
if microsoft was bullish on genai, they wouldn’t be cancelling so many data center leases. nor would satya be out there saying that “it’s good to have some skepticism” about whether progress will continue because that will “will motivate more innovation.” not exactly the words of someone who is confident in the present pathways.
if customers loved interacting with ai-driven customer service reps, then klarna wouldn’t have decided to rehire hundreds of them after customers complained.
there are still several pathways that could unblock or accelerate progress. agentic stuff; multimodal stuff; hybrid symbolic + statistical models, robots and more. some of these might yield big breakthroughs, but at this point it’s hard to assign meaningful probabilities there. appeals to hybrid methods still feel pretty hand-wavy, and we’ve been attempting them for decades. robotics is incredibly difficult. agent performance plummets when tasks get complex.
so where does that leave us?
we don’t have to worry about agi in 2027. nor do we need to worry about the subsequent political, economic, and social consequences that the ai 2027 paper spends so much time fussing over. we might not even have to worry about widespread job displacement or major labor market disruption.
we still should worry about manipulation, learning, wellbeing, and youth safety. there’s a lot we can do about those things now (mostly better education and less hype). we can still be bullish about the present value of these systems. they’re really good, and we’re going to get a lot of unknown value out of their existing capabilities. but we should calibrate how much time we spend worrying about the post-agi stuff and really rigorously interrogate the assumptions in arguments that purport to show that agi is imminent or even inevitable.
technological inevitability arguments are generally quite bad. most trends are just that; they don’t last forever. transistors per chip kept doubling every two years at a fixed cost until they didn’t. cars got faster and more fuel efficient, but they’re not flying or supersonic. even the things that were flying and supersonic failed to scale and got shut down entirely. there’s little reason to think progress must continue because that is what technology does. if anything, it’s become increasingly true that technological progress is the exception, not the rule. more to say about that in a future post, but if you’ve read robert gordon and tyler cowen, it won’t be news.