AI hype is 80% real

Increasingly, our profession—as programmers—is splitting. Some of us seem to be vibecoding everything. Others don’t trust an AI within a ten-mile radius.

Deep splits in programming are not new. Once upon a time, compilers were widely mocked as a waste of valuable machine time, in spite of Grace Hopper's arguments. Eventually, we decided she was right, and settled that debate.

This time, it feels different. We may not settle it.

Are LLMs the future of programming? Or are they a waste of valuable silicon? We can’t even agree on the time-honored interview answer—“it depends”—let alone what that depends on. Are LLMs useful for research or for coding? For “simple tasks”? And what, exactly, counts as simple?

Recently, I opened my browser to a top article titled “Don’t fall into the anti-AI hype,” argued by a brilliant programmer. Alongside it was the critical top comment:

I don’t understand the stance that AI currently is able to automate away non-trivial coding tasks. I’ve tried this consistently since GPT-3.5 came out, with every single SOTA model… I’m not sure what else I can take from the situation.

The good news is: I do understand. I’m quite sure what else to take from it.

The bad news is: that it takes a long time to explain. I’ve organized this essay into sections, but I’m afraid I can’t make it any shorter.

But it really feels like two intelligent, capable camps of engineers are transcending the tabs vs spaces argument into completely different fields. Or speaking different languages, using different standards of evidence. Maybe even living in different realities.

Some of us report almost unbelievable engineering feats using AI. Others say they can’t automate even a simple programming task.

Well: which is it?

Are vibecoders deceiving themselves by not reading the code closely enough? Are skeptics simply bad at prompting? What the hell is going on?

My first goal here is to explain—in far more detail than the modern attention span will tolerate—what is going on.

My second goal is to explain, or at least gesture at, in standard engineering terms, what those of us in the “moderately pro-agents” camp are actually doing. Is this responsible engineering practice, or is it reckless?

My third goal is to clean up my own house. Frankly, agent-coding advocates—including me—keep making arguments that don’t survive even the most basic scrutiny. There are much better arguments we should be using.

There are also legal, ethical, and societal issues in the room. I can’t avoid touching them entirely, but my aim here is far narrower: to speak about programming issues, as a programmer, to other programmers. Rather than gesticulating vaguely at domains outside my expertise.

So prepare to be unhappy for several reasons. Brew a strong cup of coffee, and bookmark this to read the rest of the story.

Hype cycles

Let’s get one thing out of the way: programming goes through hype cycles. An idea is hailed as the future, rapidly adopted (or mandated), endlessly debated—and then either collapses or settles into a quieter, more realistic niche.

Classic examples include OOP, XML, Java, Web 2.0, NoSQL, and microservices. You can add your own.

I think of these as tides. Tides come in; tides go out. There’s a lifecycle so regular you can almost set your watch by it.

It’s entirely reasonable to assume that AI is just the latest tide. The agent camp has done an abysmal job of distinguishing itself from that interpretation—comedically bad, in fact. Unfortunately, I can't transition into a comedy career because only programmers will laugh at my jokes.

I’m typing this on a computer with an NPU. What parts of my codebase can I accelerate with it? Where’s the API, the instruction set, the concrete C, C++, or Rust project that demonstrates this?

If you’ve read this far, then you’re probably also reading it on a device with an NPU too—and your silicon vendor won’t answer these questions, either. In fact, no manufacturer seems able to survive basic technical due diligence on... a headline feature??? The emperor has no clothes.

On the other hand, asking whether we’re in a hype cycle has a trivial answer: yes. Of course. Of course we are. But when wasn't the emperor naked?

The better question is: given that we’re in a hype cycle, what therefore, should we do? How do we separate real engineering from marketing bluster?

Because in the dot-com era, some companies were Amazon; others were Webvan. We're going to work for somebody; so which one is which?

Past, not future

Another messaging disaster from the AI camp is constantly changing the subject to the future. I haven’t seen the future yet. Neither have they.

So let’s do a hard U-turn and talk about the past. About things that already happened, that we can all observe and measure.

Static vs dynamic typing

Around the 2000s—and still today—there’s an ongoing debate about static versus dynamic typing. I’ve encountered it constantly in consulting work, and spent dozens of hours in meetings about it.

You may be familiar with standard hacker essays like “The Unreasonable Effectiveness of Dynamic Typing for Practical Programs” or “Parse, Don’t Validate.” Great essays. They get cited pretty often.

Instead, I want to focus on Dan Luu’s meta-review of the empirical research—something I’ve never seen brought up in a meeting:

Unfortunately, they all have issues that make it hard to draw a really strong conclusion… under the specific circumstances described in the studies, any effect, if it exists at all, is small… most of them are passed around to justify one viewpoint or another.

If he’s right—and I am convinced that he is—then:

We don’t actually know whether static or dynamic typing has a meaningful effect.
Many people are very confident that it does, for reasons that don’t survive scrutiny.
If researchers got a nickel every time programmers cited an arXiv paper, the debate might have been settled by now.
Probably, most things you read are just justifying a viewpoint. Like this one.

Most of those meetings were a waste of everyone's time.

An analogy

If you’ve read a peer-reviewed paper on agentic coding, it’s probably “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” Here is when it scored 775 points on hacker news.

And here’s the sentence that programmers quote:

Developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%.

Big if true.

But notice what a randomized controlled trial actually tells us: it controls for our own delusions. Interpreted literally, the result suggests we’re all 39% wrong about ourselves, regardless of whether we even care about AI at all.

A participant in the study later wrote that their own estimates were wildly off—hours instead of minutes—which echoes a very old lesson from Joel Spolsky’s Evidence-Based Scheduling that I still remember.

But once you start reading all the research, Dan Luu style, the picture becomes more complicated. Earlier. RCTs found productivity increases of ~20–25%, with wide confidence intervals. So which is it? Does AI help or hurt? I am deliberately using a hard date cutoff, because you didn't read any more recent papers, either.

I’ll leave the parallel problem that is playing out in research, to the researchers. What you can verify at home is these papers had much bigger Ns than N=16, and got less than half the discussion in programming circles! Pre-existing biases is doing most of the epistemic work around here.

Which is not surprising, actually.

Confounding factors

A very reasonable hypothesis is that some confounding factor explains all the contradictions. Maybe some people have good results, and other people have bad results. Often this is hand-waved as a “skill issue.”

I think that’s broadly true. Practice matters.

But where is the tutorial explaining what this skill is? Where’s the well-cited essay breaking it down? Where's that paper on arxiv? Why isn’t that the focus of all the meetings?

“I tried the models”

I need a whole section on this common programming refrain. Here it is:

I've tried this consistently since GPT 3.5 came out, with every single SOTA model up to GPT 5.1 Codex Max and Opus 4.5. Every single time,

Or:

Feel free to claim that paid versions or agentic models would be better but that’s not what I’m testing.

Or:

I don’t think that’s really a fair summary. He did try multiple assistants, including Claude.

Everybody in my camp is lying to you about this fact: the models don’t matter. Stop trying new models. Try something else. Anything else. Literally: imagine any other parameter than that, and you will accidentally stumble into new research in computer science.

Yes, models differ. Yes, some are better at some tasks. Yes, I can given you the "it depends" interview answer and speak at length about why this model is better than that model at some task.

Since this isn't an interview, here's the answer from a leaked source: We Have No Moat, And Neither Does OpenAI.

In particular, you can get absolutely astonishing results from awful models. Let me repeat that again: use bad models.

If you aren’t getting interesting results: you should ask why. Then you should ask why, again. And again, and again, until you finally discover there's a replication crisis in the entire field of programming.

This is classic Feynman territory, that I have tried, and have failed, to write any better. So instead I just need to quote it at length:

Other kinds of errors are more characteristic of poor science. When I was at Cornell. I often talked to the people in the psychology department. One of the students told me she wanted to do an experiment that went something like this—I don’t remember it in detail, but it had been found by others that under certain circumstances, X, rats did something, A. She was curious as to whether, if she changed the circumstances to Y, they would still do, A. So her proposal was to do the experiment under circumstances Y and see if they still did A. I explained to her that it was necessary first to repeat in her laboratory the experiment of the other person—to do it under condition X to see if she could also get result A—and then change to Y and see if A changed. Then she would know that the real difference was the thing she thought she had under control. She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1935 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happens. Nowadays there’s a certain danger of the same thing happening, even in the famous field of physics.

Well: Nowadays there's a certain danger of the same thing happening, even in the famous field of programming.

But surely Mr. Feynman, someone is getting these impressive results?

Like if I am not kidding you, then somebody out there somewhere, is getting impressive results from AI models.

You’re absolutely right! And here they are:

In the past week, just prompting, and inspecting the code to provide guidance from time to time, in a few hours I did the following four tasks, in hours instead of weeks - antirez

I traveled around all year, loudly telling everyone exactly what needed to be built, and I mean everyone... I went to senior folks at companies like Temporal and Anthropic, telling them they should build [a project], I went up onstage at multiple events and described my vision for the orchestrator. I went everywhere, to everyone. ...But hell, we couldn't even get people to use Claude Code - Steve Yegge

I crossed an interesting threshold yesterday, which I think many other mathematicians have been crossing recently as well. In the middle of trying to prove a result, I identified a statement that looked true and that would, if true, be useful to me.Instead of trying to prove it, I asked GPT5 about it, and in about 20 seconds received a proof. The proof relied on a lemma that I had not heard of (the statement was a bit outside my main areas), so although I am confident I'd have got there in the end, the time it would have taken me would probably have been of order of magnitude an hour - Timothy Gowers

These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, arXiv

A recent example of this occurred on the Erdos problem website, which hosts over a thousand problems attributed to Paul Erdos... Already, six of the problems have now had their status upgraded from "open" to "solved" by this AI-assisted approach... an Erdos problem (#728) was solved more or less autonomously by AI (after some feedback from an initial attempt) - Terrance Tao (2)

This year, AlphaDev's new hashing algorithm was released into the open-source Abseil library, available to millions of developers around the world, and we estimate that it's now being used trillions of times a day. - DeepMind

I write all my apps in SwiftUI and I haven't written code since ~May.Everything's open source, so I can back it up. - Peter Steinberger

I was an AI skeptic. I thought LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel. I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked. - Cloudflare

An editorial note is the main problem I had writing this, is new things kept happening while I was drafting it. So I really need to get this off my plate, so I can go back to some actual work.

Also, sure. I, too, am writing the best code I ever wrote in my life. I too, have stories about how I solved X bug in Y hours that K people missed for N years, because very competent eyes were not enough to make all bugs shallow. But my stories are just more boring than accidentally solving an Erdos problem. and so is everybody's

The main point is: this is an iceberg. For everything that is visible, there are ten things that I didn’t write a longer and much more boring essay about.

Well, what can we say about the rest of the iceberg?

Silence as strategy

A recent 2025 game of the year lost that award over some milquetoast AI controversy. Its financial backers are laughing all the way to the bank, but among its credited contributors were at least six working programmers—people who could have listed that award on their résumé. Instead, a single buried quote in an interview to El Pais using “some AI” was enough to disqualify them.

The lesson is typical, but also very important:

Don’t talk about using AI.
Don’t write the obvious tutorial.
Don’t explain the trick.

It’s safer as a career move to appear mysteriously productive than to explain why.

There’s an iceberg—and most of it is under the ocean.

Turn around now.

The silence of the params

If “skill” is the factor that explains the contradiction, then the most skilled practitioners may be the least likely to explain themselves.

I’m one of them, I guess. Like: of course my RSS feed is filled with people more skilled than me. But I can solve some pretty wild undiscovered bugs, using agents.

Stack Overflow

Many of us have noticed that Stack Overflow is dying. Here's the chart about it.

The usual explanation is that people ask LLMs their programming questions, instead instead of asking humans.

I think this explanation completely misses the point.

If programming is increasingly about wrangling agents, then “programming questions” are now questions about how to wrangle them—which may no longer fit traditional definitions of programming, at all. In fact, here is a paradoxical idea that many folks have, that "programming questions" are now banned on stack overflow.

The irony is painful. Here was Jeff Atwood's original essay:

There’s far too much great programming information trapped in forums, buried in online help, or hidden away in books that nobody buys any more. We’d like to unlock all that. Let’s create something that makes it easy to participate, and put it online in a form that is trivially easy to find.

And now, modern programming knowledge is trapped again: scattered across private chats, unpublished workflows, and quiet practice. Who is publishing it? Why should they? What incentive do they have?

As a side note, apparently building a website where the main cofounders disagree with each other, but nonetheless are committed to unlocking programming knowledge, is a winning formula. So I guess if you disagree with me and want to solve this problem anyway, send me an email.

Stage magic

There’s another profession that works this way, and it has done so from very ancient times: magicians.

Magicians show you what’s possible without telling you how. I know exactly what your card is, but I won't tell anything at all about how I know.

What many people don't know is, among professional magicians, they mostly don't talk about how that works, either. The entire discourse, among working magicians, is to assume you're probably 80% of the way there already, so let's obliquely hint at the remaining 20%, and if you get the hint, that seems fair enough.

There are a few reasons why they work this way. One, this weeds out the outsiders, the people who don't know the 80% at all, and who we need to be fooled to keep this entire thing going. Two, the 80% is actually easily assembled by standard industry knowledge, so a standard interview question is to guess how a trick is done, by assembling 80% of the industry knowledge against 20% of your own creativity. And three, on rare occasion if you force people to reinvent an effect on their own, sometimes they do! And they accidentally invent new magic.

Right now, programming is in that awkward phase where some people are insisting the trick is fake, while others are very carefully not explaining how it works. While very quietly, some people are applying variations on standard engineering principles, to invent new magic, that will likely be lost to our discourse.

In conclusion

What’s actually happening is quieter, messier, and harder to talk about than a hype cycle. The gains are real, unevenly distributed, and tightly coupled to skills we don’t yet have names for, let alone tutorials. The people getting the most value are often the least able—or least willing—to explain how they do it, because explanation is risky, unrewarded, and professionally counterproductive.

That leaves us with a distorted picture: skeptics honestly reporting failure, advocates cautiously reporting success, and almost nobody describing the method in between. We mistake silence for absence, and marketing for substance.

The right conclusion is not that AI “works” or “doesn’t work.” It’s that our epistemology is broken. Hell, our entire field is broken. We are bad at measuring our own productivity, bad at sharing tacit technique, and increasingly bad at agreeing on what counts as evidence.

Hype cycles come and go. This is different. The trick is real—but so far it’s incomplete, uncomfortable, and mostly happening offstage.

Sealed Abstract

iOS and other rants

AI hype is 80% real

AI hype is 80% real

Hype cycles

Past, not future

Static vs dynamic typing

An analogy

Confounding factors

“I tried the models”

But surely Mr. Feynman, someone is getting these impressive results?

Silence as strategy

The silence of the params

Stack Overflow

Stage magic

In conclusion