AI Coding Is Building a Faster Factory, Not Replacing the Factory

March 22, 2026 • Jon Hathaway • 16 min read

The authentication module was the third one I reviewed that morning. The company was aggressively pursuing new features and new products, and instead of sharing code and standardizing, I now had three modules, all trying to achieve the same thing in remarkably different ways with different levels of maturity, features and test coverage.

I remember sitting in a modern and tech filled conference room at a mid-size enterprise. They had a good sized tech capability at roughly fifty engineers and their platform processed meaningful transaction volume daily. The reason I was there? I'd been asked to assess their AI coding adoption, tooling and processes. The leadership team had rolled out AI-assisted development nine months prior, and it followed a typical pattern. The rollout started with a few enthusiastic early adopters, but the procurement team were starting to wonder why the seat licenses tripled and the token costs were increasing to almost no end. The engineering leadership was proud, and there were some KPIs that reinforced their pride. Cycle times were down. Pull request volume had nearly doubled. The dashboards looked spectacular.

After reviewing the product requirements for each of the teams to ground myself in what the teams were trying to achieve, I opened the different codebases.

In total the development teams were building eleven new services. Each new service needed authentication as well as half a dozen overlapping capabilities and like a lot of dev teams, there was never enough time given to create and document engineering standards and architecture patterns. I counted eleven authentication implementations, not variations on a theme, substantially all recreating the same capabilities. Each of the teams had essentially re-invented the same code in eleven architecturally distinct approaches. Each service was syntactically clean, well-commented, passing its own tests. The services were generated by an AI assistant working with a different developer, different levels of experience, different mental models of how auth should work, and none of them had a reference architecture to consult. The AI had done exactly what it was asked. The problem was that none of the developers told their preferred AI assistant how to build it, nor what standards to use and what patterns already existed.

That experience keeps replaying in my head because it captures something the current discourse around AI coding consistently misses. We are in the middle of a genuine transformation in how software gets built. But most of the conversation is happening at the wrong altitude, and we all seem to be running in different directions hoping to get to the same end point.

The Arc That Should Worry You

In February 2025, Andrej Karpathy coined the term "vibe coding". While potentially a little tongue in cheek, it was the 'practice of letting AI generate code based on intent rather than specification'. It was deliberately playful, a little provocative, and it captured something real about how individual productivity was changing. It also fed and provided an answer to a long-standing problem with software development, that the rest of the SDLC, and especially the product requirements are often sub-par. Many developers just want to develop, and vibe coding provided that opportunity to operate as Product Manager and developer, circumventing the most painful parts of the process. Vibe coding reinforced the desire to be their own masters, mistaking autonomy and speed for value. Twelve months later, in February 2026, Karpathy published a follow-up framing: "agentic engineering." The person who named the phenomenon had already moved past it in a very specific way. Instead of silo'd development practices, the idea is that agents partner with the entire team as part of the SDLC, not replace it.

The arc here especially matters for those looking to adopt AI as part of their development strategy. The gap developing between vibe coding and the more mature agentic engineering practice, is more than just a branding evolution. It is the exponential distance between a developer proclaiming "AI writes code for me" and a development team declaring "AI operates within an engineering system we've designed." One is a productivity trick focusing on speed of new code. The other is an organizational capability with the ability to create high-end, maintainable applications in a repeatable way. Today, in 2026, most enterprises are stuck at the first one, celebrating output metrics while the architecture, maintainability, supportability underneath fragments.

The industry data is starting to confirm the pattern. Cortex's 2026 "State of AI Benchmark" found that AI-assisted teams produced 98% more pull requests. While that metric sounds impressive, the review times also increased 91%. Additionally, incidents per pull request rose 23.5%. An additional report from CodeRabbit showed AI-generated code creates 1.7 times more issues than human-written code. If those data points weren't enough to highlight the issue, Faros reported that developers are now spending 24% of their workweek checking AI output.

By reading those numbers together, a picture starts to emerge that challenges the current narrative. We are building a faster factory. The factory is producing more defects per unit.

While it may sound like I'm against the current AI movement, it's worth stating that I have been building and leading engineering organizations for fifteen years, and I also use AI coding tools daily. Claude Code is part of my own development workflow, and I maintain a reference library of over 1,355 lines of context files, fifteen purpose-built agents, and twenty-one skills for projects I run as a solo developer. I engineered the entire Claude environment to mimic a highly tuned, architecture led, human in loop, quality first engineering process. Going faster end to end with high quality, not just more code. I am writing from inside the machine, not from the sidelines, with enough years behind me to recognize when a pattern I've seen before is repeating at scale. I saw the same acceleration when open access to cloud environments 'accelerated' development, only for CFOs to then demand governance. We can learn from the past only if we choose to.

The Throughput Illusion

There is a seductive narrative in the AI coding conversation in that 'finally' we can build quicker. The irony here is that more and more startups are trying to use AI coding, believing Vibe Coding is the answer to accelerate their businesses and compete, while at the same time failing to manage the quality. That lack of quality then causes a constant churn of bug fixing, issues, ad hoc testing, eating up the very time they hoped to benefit from, while in parallel believing the answer to their problems is a better prompt library. There is a belief in the startup space, that the evolution of vibe coding is going to be AI generated massive pull requests, with many creating PRs of 10,000+ lines of code! Only to then use AI to review the PR. This reinforces the misinformation that developers are more productive because they generate more code, therefore we need fewer of them. Each link in that chain is wrong, and the wrongness compounds.

SmartBear and Cisco's research on code review effectiveness, established years ago that human reviewers hit a wall around 200 to 400 lines of code per hour. Beyond that threshold, review quality degrades sharply and standards enforcement declines sharply. A 5,000 line pull request isn't the same as 5 PRs at 1,000 lines each. The larger PR gets rubber-stamped or tools and prompts are created to try and handle the unreasonable PR. The reviewer's eyes glaze over. Defects pass through and architecture best practices and engineering standards degrade. The DORA research program at Google, conducting the most rigorous longitudinal study of engineering performance we have, has consistently shown that high-performing teams ship smaller, more frequent changes with lower change failure rates. Not larger batches. Not more volume.

When AI generates massive PRs, and we celebrate the output, do we as engineering leaders really believe we are measuring the right thing? We are confusing volume with throughput, and most definitely confusing volume with quality. A factory that produces twice as many widgets but sends a quarter of them back for rework is not twice as productive. It is the same factory with a bigger rework queue and a more exhausted team.

I have watched this movie before, as some kind of eerie remake. The DevOps movement a decade ago promised that with automation we would accelerate delivery and together Developers and Operators would finally understand each other. And the automation did accelerate delivery for organizations that had an engineering discipline in place and viewed the SDLC through the lens of Systems Design. For organizations that did not, all that happened is automation accelerated their existing dysfunction. Deployment frequency went up, quality went down, and change failure rate followed right behind it. Speed without quality is not acceleration. It is an accumulation of technical debt, comprehension debt, architectural fragmentation. While these were the specific flavor of changes within that era, the pattern is identical. Lack of engineering leadership and standards typically prioritizes and measures the wrong things. AI coding guardrails are not a new problem, but like DevOps accelerated failure for many, AI is just the same problem wearing a new hat.

Where the Value Actually Lives

The fixation on code generation obscures where AI actually creates the real value in the engineering lifecycle. Code generation is only one stage, development, out of at least nine distinct stages in a mature software delivery pipeline: ideation, requirements, planning, development, testing, security analysis, architecture review, code review, and release engineering. If these don't already exist within your organization, don't expect AI to be the answer without effort, architecture and standards being put in place.

AI can contribute meaningfully at every stage of the SDLC, but only with careful design, roll out, and investment, both financially and structurally. AI can draft requirements from stakeholder notes. It can generate test cases from specifications. It can perform static security analysis in real time. It can validate architectural decisions against documented patterns. It can automate release checklists. The real workforce economics are not "AI writes code, so we need fewer developers." The real economics are "AI accelerates every stage of the pipeline, so the same team can deliver more value with fewer bottlenecks."

That distinction matters enormously when leadership teams are trying to model headcount impact. The majority of leaders have no idea how much it really costs to develop and release software. The savings are in the pipeline, with reduced cycle time across stages, fewer handoffs, less rework. The savings are never going to come from not typing as much. Organizations that chase the code generation narrative end up cutting developers while the review backlog, the test gap, and the architectural inconsistency all grow. The problems existed before AI and they exist after. Many organizations failed to invest and grow their staff headcount in the non-code-writing teams when they didn't have AI, and therefore it's not overly surprising that we will continue to misunderstand the rest of the SDLC in a post AI world. We are relying on AI to answer the question, make things cheaper, increase quality, increase release cadence, all while never defining what that means for their human counterparts. The scary thing now is AI determining what great looks like for your organization, without any oversight.

The Reference Library Problem

Which brings me back to that conference room and the eleven authentication patterns.

The root cause was not bad developers. It was not bad AI. It was the absence of codified engineering standards. Those mystical documents that every new developer is meant to read when they join a team. The library of definitions that communicate the intent of why we build software the way we do as an organization and the architectural patterns that underpin the system's existence. Those documents are useful for humans, and they are essential for AI to read, follow, and enforce. Without a reference architecture, every AI session starts from the same place, its training data. And training data is nothing more than a popularity contest, not an engineering discipline.

Large language models are typically trained on public repositories filtered by signals like GitHub stars — often with a threshold of 100 or more stars as a quality proxy. That filter selects for popular code, not necessarily correct code. Enterprise software wasn't included in the LLM training data. Your software likely wasn't included in the LLM training data. As multiple security researchers continue to document, LLMs tend to reflect the security practices present in their training data. When that training data includes common shortcuts, the AI will confidently reproduce those shortcuts in your enterprise codebase. And this is where things get terrifying. If you use AI-assisted code review to check AI-generated code, you get circular validation. You might try to change LLM models but at the root, they are all trained on the same data sets. The same biases that generated the code are present in the model reviewing it.

The real answer is not to stop using AI. The answer is to simply give it something better than crowdsourced training data, patterns, architecture and security workarounds to work from.

This is what I've started to call the 'composer model' — and it is the operating principle behind how I use AI in my own work. The composer of an orchestra does not play every instrument. Instead, the composer writes the score, sets the constraints, defines the voicing, and ensures every section serves the whole. In my own workflow, that means a 120-line configuration file, fifteen context documents totaling roughly 1,355 lines of codified standards. Depending on the type of project, I model how I want the software to be designed, built, tested and released, and define the purpose-built agents for specific pipeline stages. I reference my pre-existing architecture patterns, my long-running list of things I learned and antipatterns I want to stay away from. I tune my environment for each project for the given risks, challenges and ultimately the purpose of the project and its technology stack. The governance overhead for a fifty-person enterprise engineering team is multiplicatively larger.

My project configuration is the score the orchestra reads from. Without it, you have fifty talented musicians improvising simultaneously. The result sounds like noise, even if every individual part is technically proficient.

A Diagnostic for Your Organization

If you are evaluating or expanding AI coding adoption, here are the questions I would ask before increasing velocity:

Standards readiness:

Are your coding standards documented in a format AI tools can consume, or do they exist only in senior engineers' heads?
Do you have current Architecture Decision Records for your major patterns (authentication, data access, error handling, observability)?
When a new engineer joins, is there a reference implementation they can point an AI at -- or does each team have its own conventions?
If you were to develop an app today using AI, would you be able to faithfully reproduce it in the future with the same features, security, reliability and constraints?

Review capacity:

Has your review process scaled with your AI-generated output, or are reviewers absorbing more volume at the same capacity?
Are you measuring review quality (defect escape rate, post-merge incidents) or just review completion?
Is AI reviewing AI-generated code without human architectural judgment in the loop?

Pipeline balance:

Are you deploying AI across the full lifecycle -- requirements, architecture, testing, security -- or primarily at code generation?
Can you trace a generated component back to the standard it was built against?
When AI generates something that contradicts your architectural patterns, does your process catch it before merge?

Organizational discipline:

If you removed AI tools tomorrow, would your engineering standards still be documented and enforceable?
Are your standards the input to AI, or is AI's training data the de facto standard?

If you answered "no" or "I'm not sure" to more than three of these, you are building velocity without governance. That works for a while. It does not work at scale, and it does not work under audit.

The Principle That Holds

The principle I keep coming back to, whether it be DevOps adoption, cloud migration, platform modernization, and now AI coding — is the same one: automate as fast as quality allows. Not faster. Not slower. Exactly as fast as your engineering discipline permits.

Vibe coding gave us the acceleration, it's real, and it is not going away. But acceleration without engineering process is a car without steering. A reference library gives AI something worth following to meet the outcomes you want to achieve. The composer model gives the system someone accountable for the outcome, with context engineering providing the instructions. Together they shape AI behavior within your specific organization, and are not optional. It is the difference between AI that amplifies your engineering discipline and AI that amplifies the absence of it.

The enterprises that will lead in developer productivity over the next two years are the ones that have codified what good looks like, made it machine-readable, and put a human accountable for the architecture within every team. They will move fast because they built the guardrails first. They can be consistent, and they can reproduce the same features reliably to the same high standard.

I think about those eleven authentication patterns often. The failure was upstream. It was organizational. Leaders had failed to write the score for the orchestra.

If you are a CTO or VP of Engineering evaluating AI adoption right now, the question is not whether to adopt. The question is whether you have built something for the AI to follow. Because the faster factory is already running. The only question left is whether it is building what you intended and whether you can maintain the quality your customers demand.

References

Karpathy, Andrej. "Vibe Coding." February 2025.
Karpathy, Andrej. "Agentic Engineering." February 2026.
Cortex. "2026 State of AI Benchmark." 2026.
CodeRabbit. AI-Generated Code Quality Report. 2026.
Faros. Developer Productivity and AI Output Report. 2026.
SmartBear and Cisco. Code Review Effectiveness Research.
DORA (DevOps Research and Assessment), Google. State of DevOps Reports (longitudinal study).

Is your AI adoption building velocity without governance?

A focused assessment of your engineering standards, review processes, and AI integration can surface where the real leverage is before the architecture fragments.

Book a Discovery Call