GitHub Copilot's Agent Mode Writes Code While You Sleep—I Tested It For a Week

GitHub Copilot just got scary good. The new agent mode doesn't just suggest lines of code anymore—it can iterate on entire features, refactor projects, and basically code autonomously while you go about your day. I spent the last week testing it on real projects, and I have complicated feelings about it.

Let me explain what actually happened when I gave an AI agent commit access to my repositories.

Unsplash: Clean code editor screen with coffee cup beside keyboard

What Agent Mode Actually Does

Traditional Copilot suggests code as you type. It's autocomplete on steroids. Agent mode is different—you give it a high-level task and it goes to work. Like, actually writing functions, creating files, running tests, and iterating based on the results.

The magic is in the iteration. If it writes code that fails tests, it reads the error messages and fixes them. If you give it feedback, it refactors. It's not just generating code and hoping for the best—it's actually trying to solve problems.

I started small. Asked it to implement a feature I'd been putting off: adding export functionality to a dashboard app. Gave it the spec, went to get coffee, came back to a pull request with working code and tests. Took it maybe 15 minutes. Would've taken me two hours.

The Good Tests

For my second test, I threw something harder at it: refactoring a legacy module that had grown into an unmaintainable mess. This thing had functions with 200+ lines, mixed concerns, no clear structure. The kind of code you avoid touching because you know it'll be a nightmare.

Told Copilot agent to refactor it following SOLID principles. It broke it down into logical components, separated concerns, added proper error handling, and even wrote unit tests. The PR had 47 files changed. I reviewed it expecting disaster, but it was... actually good?

Not perfect—there were a couple of logic bugs I had to fix—but way better than I expected. The refactor was solid enough that I merged it after cleaning up those issues.

Someone I know at a startup used it to migrate their entire API from REST to GraphQL. Said the agent handled probably 80% of the boilerplate and repetitive conversions, leaving them to focus on the interesting architectural decisions.

Where It Gets Weird

Here's the uncomfortable part: I found myself reviewing AI code more than writing it. On complex features, I'd spend my time giving the agent direction, reviewing its output, and tweaking things rather than actually coding.

That sounds great in theory—working at a higher level of abstraction, focusing on architecture and design. But in practice? It's oddly unsatisfying. I became a code reviewer and prompt engineer more than a developer.

Also, the agent makes mistakes that a junior dev wouldn't. It'll sometimes use a deprecated library, or implement something in a technically correct but overly complex way. You need to actually understand what it's doing to catch these issues.

I tried letting it work overnight on a larger feature. Woke up to 15 commits and a working implementation. But the code quality was inconsistent—some files were clean and well-structured, others were messy and would've failed code review. It's like the agent got tired (which is ridiculous to say about an AI, but that's how it felt).

The "Next Edit Suggestions" Feature

This is subtler but almost more useful than full agent mode. As you're coding, Copilot now suggests the next logical edit. Not just the current line—the next thing you'll probably want to change.

I was debugging a component and it suggested three places where similar bugs probably existed. It was right about two of them. That kind of pattern matching is legitimately helpful.

It also notices when you're being repetitive and suggests ways to DRY up the code. I was writing similar functions for different data types and it suggested making them generic. Smart.

Performance and Speed

The agent is noticeably slower than regular Copilot. That makes sense—it's doing a lot more work. But watching it "think" for 30 seconds before generating code tests your patience.

The speed varies wildly too. Simple tasks happen fast. Complex refactors can take minutes. And if it hits an error and needs to iterate, you might be waiting a while.

There's also the issue of API costs. This thing burns through tokens. GitHub eats that cost for now, but I wonder how sustainable this is if usage explodes. We might see usage limits or tiered pricing eventually.

Who Should Use This

If you're working on greenfield projects with lots of boilerplate, agent mode is fantastic. Setting up CRUD endpoints, building out data models, implementing standard patterns—it excels at that.

If you're maintaining complex legacy systems where understanding context is crucial, I'd be more cautious. The agent doesn't grasp the full context of why things were built a certain way.

Solo developers and small teams will probably get the most value. You can move faster without hiring. But there's a learning curve to using it effectively—knowing what to delegate and what to handle yourself.

The Elephant in the Room

Are we automating ourselves out of jobs? Maybe eventually, but not yet. The agent is a powerful tool, but it needs skilled developers to use effectively. You need to know what good code looks like, how to review it, and when the AI is going down the wrong path.

But junior dev roles? Those are going to get weird. Why hire someone to write boilerplate when Copilot can do it? The value shifts to more senior skills—architecture, code review, system design.

I'm not sure how I feel about that. On one hand, eliminating grunt work is great. On the other, I learned a ton writing that grunt work early in my career.

My Take After a Week

GitHub Copilot's agent mode is simultaneously impressive and slightly concerning. It's genuinely useful for accelerating development, especially on well-defined tasks. But it changes the nature of the work in ways I'm still processing.

I'm going to keep using it, but thoughtfully. For certain types of work—implementing specs, refactoring clean code, writing tests—it's a huge time saver. For complex problem-solving where context and nuance matter, I'm still doing the heavy lifting myself.

The tool is best when you treat it like a very fast, very capable junior developer. Give it clear direction, review its work carefully, and be ready to step in when it goes off track. Don't expect it to magically solve your hardest problems, but do let it handle the tedious stuff that's been sitting in your backlog.

Am I worried about AI replacing developers? Not immediately. But ask me again in a year when agent mode has iterated a few times and gotten even better. The pace of improvement is wild, and I'm not sure where the ceiling is.

For now, it's a powerful tool that makes me more productive. Whether that's awesome or unsettling probably depends on whether you're using it or competing with it.