I decided to write down some thoughts on agentic coding and why it’s a very hyped wrong turn.

Let me start with some background on my LLM experience. I adopted LLMs into my work in Aug 2020. I was sold when I saw that GPT-3 could generate usable SQL statements. Something that used to take 4-8 hours of RTFMing, now took 15min. I have since worked on chatcraft.org, various RAG frameworks, etc. I use aider heavily for work, frequently switch models, have been struggling with tool calling since before MCP as even an idea.

LLMs Can Complete Hard Software Engineering Tasks.

According to the Internet hive mind agentic coding tools can complete multiple implementations of complete software (write code this way or FOMO). This video is a perfect example (if this were true we would be drowning in useful software). “ADHD software agents on meth” is how I would describe that approach to software engineering.

I would double down on influencer claims and say that LLM-enabled tools can complete software that a large fraction of currently employed engineers would fail at

However so far I’m only aware of 1 such end-to-end AI-written project: HTTP/2 server (blog post). This is the single most impressive code-gen project I’ve seen so far. I did not think this was possible yet.

Highlights from: Building a 100% LLM-written, standards-compliant HTTP 2.0 server from scratch with Gemini 2.5 Pro

  1. A ridiculous amount of work went into feeding the LLM correct context

  2. Author micromanaged LLM workflow to keep it progressing. LLMs would get stuck, author would split up tests, devised an algo for changing LLM context when LLM got stuck, devised a files-as-units-of-work strategy, etc

  3. It would have been impossible to complete the test by relying on tool calling. This matches experience in aider blog. That’s ironic because tool calling is the unreliable foundation of current agentic hype. Another interesting thing is that producing JSON is very difficult for LLMs. Often ad-hoc formats that LLMs were not tuned for work better.

  4. It took days just to stream all the API calls (Google is 2-4x faster than Anthropic) and weeks to get completed software

So yes, LLMs can write code, but instead of being free agents, they can only write software under intense algorithmic supervision by extremely stubborn software engineers. The irony is that under such tight controls even junior human software writers might succeed at this task.

Context is everything

The output produced by an LLM is only as good as the context fed into it (including the original question). This is a hard unsolved problem. Current agentic programming is similar (but much worse) to the genetic algorithm hype I remember for the 90s. Yes, brute-force works, but it’s often too expensive of a solution.

I think until we come up with a better way to curate context for LLMs, they will only be truly impressive in the hands of exceptional software engineers (Kenton and Sean are also incredible communicators).

In the meantime continue expecting mediocre results from mediocre people feeding LLMs mediocre context.