So there's this study that came out from METR that's absolutely breaking my brain, and I need to talk about it because it contradicts literally everything I've been telling myself about my workflow for the past year.
They took 16 experienced open-source developers—people working on massive projects with millions of lines of code—and ran a proper randomized controlled trial. Half the developers got to use AI tools (mostly Cursor Pro with Claude), half coded without them. Real issues from their actual repositories. Proper scientific method.
The result? Developers using AI were 19% slower. Not faster. Slower.
But here's the part that's genuinely unsettling: before starting, these developers predicted AI would make them 24% faster. After finishing—even though they were measurably slower—they still believed AI had sped them up by about 20%.
They were wrong. And they didn't know they were wrong. And honestly? I think I might be wrong too.
The Dopamine Trap
I've been using Cursor pretty heavily for the last few months, and I absolutely feel more productive. Like, the feedback loop is incredible. You type a comment, boom, code appears. It feels like progress. It feels like shipping.
But "feels like progress" and "actually making progress" are apparently two very different things.
The study suggests that AI coding assistants create this dopamine reward cycle. You get instant feedback. Code drops into your editor immediately. That little hit of "I'm doing something" keeps you engaged. But engaged doesn't mean effective.
It's like... you know how infinite scroll feels productive when you're "researching" something, but then three hours later you realize you absorbed nothing? Same energy, different context.
Why Slower Might Actually Make Sense
Here's what I think is happening, and it matches my own experience when I'm honest with myself:
The AI writes code fast. Really fast. But that code needs to be read, understood, verified, and often fixed. And reading AI-generated code is weirdly exhausting in a way that reading code you wrote yourself isn't.
When I write code, even if it takes longer, I understand every decision that went into it. I know why I chose that variable name, why I structured it that way, why I handled that edge case. It's all in my head already.
When AI writes code, I have to reverse-engineer those decisions. Why did it do it this way? Is this the right approach? What edge cases did it miss? And crucially: does this even do what I asked for, or just something that looks close enough?
That cognitive overhead adds up. Fast.
The Context-Switching Problem
Another study from Faros AI analyzed 10,000 developers and found something fascinating: teams with high AI adoption dealt with 47% more pull requests per day. They were juggling way more parallel workstreams because AI made it easy to scaffold multiple tasks at once.
On paper, that sounds great. In practice? That's called context-switching, and it's productivity poison.
I've noticed this in my own work. With AI, I'll start three different features because spinning them up is so easy. But then I'm mentally tracking three different implementations, three different test suites, three different sets of edge cases. The overhead of switching between them eats whatever time I saved in the initial coding.
It's like if you could instantly start cooking five different meals simultaneously. Sure, you got them all started faster, but now you're running between five different stovetops trying to keep everything from burning.
When AI Actually Helps
Okay, but AI coding tools aren't universally bad. The same research shows they're genuinely useful for specific things:
Junior developers see bigger gains than senior developers. Which makes sense—if you're still learning patterns and idioms, having an AI show you "here's how this is typically done" is educational. For experienced developers, we already know the patterns; we're just being slower about implementing them.
Throwaway code is where AI shines. Need a quick script to process some data once? Perfect AI use case. I'm not going to spend 30 minutes writing beautiful, maintainable code for something I'll run once and delete.
Boilerplate and repetitive tasks are also solid wins. Test files, configuration, that kind of thing. The stuff where you're basically a human template engine anyway.
But complex features? Architectural decisions? Debugging weird issues? AI is... not helping. Or at least, not helping as much as we think it is.
The Productivity Theater Problem
What really bothers me about these findings is the implication: we might be optimizing for the wrong thing.
Companies are measuring "lines of code written" and "commits per day" and seeing those numbers go up with AI adoption, then declaring victory. But if the code needs more review time, creates more bugs, or requires more iterations to get right, did we actually win?
I talked to a friend who works at a startup where leadership is pushing hard on AI adoption. They're tracking "AI tool usage" as a metric. People are getting dinged for not using Copilot enough. But nobody's tracking whether features are shipping faster or with fewer bugs.
It's productivity theater. We're performing productivity without necessarily achieving it.
My Conflicted Take
Look, I'm still using Cursor. This article was partially written with Claude's help. I'm not about to become some anti-AI Luddite who writes everything by hand.
But I am going to be more skeptical about when I'm actually getting value versus when I just feel like I'm getting value.
The times AI has genuinely helped me:
- Generating test cases for code I already wrote
- Converting data formats (JSON to CSV, that kind of thing)
- Writing documentation and comments
- Explaining unfamiliar code or libraries
The times AI has probably made me slower:
- Complex feature implementation
- Debugging (AI is terrible at debugging)
- Architectural decisions
- Anything requiring deep understanding of the codebase
The Uncomfortable Question
Here's what keeps me up at night: if experienced developers can't accurately assess whether AI is making them faster, how is anyone supposed to make informed decisions about AI adoption?
We can't trust our intuition. The metrics companies are using are misleading. And the companies selling these tools have every incentive to cherry-pick the success stories and ignore the failures.
There's probably some optimal use case for AI coding assistants. Some sweet spot where it's genuinely productivity-enhancing. But I don't think we've figured out what that is yet. And I don't think most of us are using it that way.
Right now, we're in this weird phase where everyone's experimenting, and we're all kind of lying to ourselves about how well it's working because admitting otherwise feels like admitting we're behind the curve.
What I'm Doing Differently
Based on this research (and my own grudging self-reflection), here's how I'm changing my approach:
-
Turn off AI for architectural work. If I'm making big decisions about how something should be structured, I'm doing it myself. The thinking is the point.
-
Use AI for drafts, not finals. AI can give me a starting point, but I'm rewriting more of it instead of just tweaking and shipping.
-
Track time honestly. I'm going to actually measure how long features take from start to "merged and working" instead of just "code written."
-
Question the dopamine. If I feel really productive, I'm going to double-check whether I'm actually being productive or just feeling busy.
-
Stay skeptical of benchmarks. When someone tells me their AI tool improves productivity by X%, I'm going to ask "how are you measuring that?" Because apparently, our intuition is garbage.
The productivity gains might be real for some people, in some contexts, with some tools. But they're definitely not universal, and they're definitely not as large as we think they are.
And maybe that's okay? Maybe AI coding assistants are more like spellcheck than they are like a second pair of hands. Useful for catching things, but not fundamentally changing how fast you write.
I just wish we could all be a bit more honest about that instead of pretending we're all suddenly 10x engineers because we're letting Claude write our for-loops.