Why agentic coding often don't work

And how we can make it actually valuable.

Aug 18, 2025

What Agentic Coding Really Means

No matter which model you try or how powerful the latest AI agent seems to be, when it comes to actual software development in a company setting, you always land in the same reality. It all comes down to small steps that you need to orchestrate yourself. There is no magic button. No shortcut will suddenly turn an 8-hour day into 10 minutes of automated brilliance. I have tested this for months and spent a fair amount of money on Claude Code and other so-called agentic systems, just to see how far AI-driven coding can really take us as a company. My conclusion is simple. AI helps, but not in the way we like to dream it does.

It’s a bit of a sobering story.

The Temptation and the Reality

The temptation is obvious. You sit there, you define a big task, hit enter, and watch the AI spit out pages of code. You get a rush because the machine is fast, and it looks like it knows what it is doing. In practice, the result is usually something between 50 and 80 percent of what you want. Sometimes nothing works at all. And then you do not really finish the rest manually either.

What happens instead is that you start breaking down the original task and telling the AI to do the smaller steps one by one. The danger here is subtle. Over time, developers and teams start to lose touch with their own codebase. I have seen this happen quickly. Within one or two months, people can literally unlearn how to solve things themselves. That is not just inefficient, it is a kind of modern vibe coding. It feels fast, but it eats away at the qualities that matter.

The funny thing is that coding speed does not really matter in professional software work. Maybe in a contest on stage, where someone shouts, "Who can generate this function the fastest?" and you win because you have an agent. But in a real company setting, what matters are things like stability, maintainability, sustainability, observability, and the list goes on. None of these has anything to do with raw coding speed. If you build a feature properly in one day with human oversight, compared to ten minutes of brittle AI code, the business is still better off with the day. This is the real-world tradeoff.

The real value of agentic coding

It drifts easily. It hits token limits and suddenly forgets everything like a patient with dementia

So, where does AI help? For me, the value is in reducing friction while not degrading the qualities of the system. Ideally, you even improve them. AI is not the captain of the ship. You are. But AI is an instrumental officer on that ship. Every company has economic constraints. Small companies even more so. AI can make a difference if you use it wisely. It shines especially in two places. Code research & planning, and step-wise execution.

The first is even more potent than the second. Exploring libraries you have never seen, summarizing documentation, and giving you quick clarity on a concept. And then, yes, helping you execute small, well-defined tasks. But always step by step.

Why is that?

That is why you must work on one expectation at a time.

Because AI looks smarter than it is. It writes fast. It recognizes patterns. But it cannot prioritize. It has no real long-term memory. It drifts easily. It hits token limits and suddenly forgets everything like a patient with dementia. That is why you must work on one expectation at a time. Define a single goal. Write the test first. Let the AI propose an implementation. Review carefully. Then move to the next step. It is not revolutionary. It is actually how good developers have always worked. The only difference is that the AI reduces the grind and the mental friction. It takes away some of the heavy lifting on the small things, so you can think about the bigger ones.

When I say agentic coding, I do not mean replacing yourself with a machine. Forget about the term vibe coding altogether. That was always empty. Think of agentic coding as precise executive support. Use Claude Code, Cursor CLI, or Junie for this. GPT5 and Claude Sonnet 4 are both strong, but they fail in large codebases just the same.

Do not expect one model to suddenly be night and day better than the other. And do not blame hallucinations either. The AI is usually precise enough. The problem is that humans carry silent context, and the AI does not. What you and your peers understand with a look or a nod, the AI will not. That is on us to brief better and keep things small and clear.

Large codebases, 200k lines and above, are where you really feel the limits. Humans are simply better at seeing the complex connections. AI is an extension, not a substitute. And if you ask it to do multiple tasks at once, the quality always drops.

Stick to one thing at a time.

The Role of Testing

In my own experience, testing is the glue that makes this whole stepwise approach really shine. Behavior-driven development in particular helps. One behavior, one expectation, one step. The AI can then handle flows without getting lost in juggling multiple contexts. This results in code that is not only faster to produce but also better aligned with the qualities that matter most to the business.

So do not chase the dream of squeezing a full workday into 10 minutes of AI output. That path only leads to fragile systems and loss of human expertise. Instead, embrace AI as what it really is. A research assistant. A planning partner. An executor of small, verifiable steps. That is what agentic coding means to me. It is not a replacement for discipline but an extension of it. And if you treat it that way, you actually get the best of both worlds.

—Adrian

Discussion about this post

Ready for more?