AI and the Illusion of Productivity

Matt Asay
10 Min Read

Producing code without a strict validation framework isn’t genuine engineering; it’s merely creating a mountain of technical debt.

Sign post reading, Danger Slippery Slope. Warning sign near seafront on overcast day.
                                        <div class="media-with-label__label">
                        Credit:                                                             P.Cartwright / Shutterstock                                                 </div>
                                </figure>
        </div>
                                        </div>
                        </div>
                    </div>                  
                    <div id="remove_no_follow">
    <div class="grid grid--cols-10@md grid--cols-8@lg article-column">
                  <div class="col-12 col-10@md col-6@lg col-start-3@lg">
                    <div class="article-column__content">

Consider this sarcastic remark from developer John Crickett on X:

Software engineers: Context switching kills productivity. Also software engineers: I’m now managing 19 AI agents and doing 1,800 commits a day.

Crickett’s observation hits home perfectly because it’s less of a joke and more of a glimpse into the next big management trend. We’re poised to swap one flawed productivity metric (lines of code) for an even worse one (agent output), only to be surprised when quality plummets.

And yes, I understand that no one is making 1,800 *meaningful* commits daily. But that’s precisely the core issue. The metric is already being manipulated, and AI agents make this manipulation effortless. If your organization starts praising “commit velocity” in the age of AI, you’re not measuring actual productivity. You’re quantifying the speed at which your team can generate potential liabilities.

The grand promise of generative artificial intelligence was that it would finally eliminate our backlogs. Automated coding agents were supposed to generate boilerplate code at incredible speeds, enabling teams to deliver precisely what the business required. However, as we move deeper into 2026, the reality is much less comfortable. Artificial intelligence isn’t going to rescue developer productivity because writing code was never the primary bottleneck in software engineering. The real impediment lies in validation, integration, and profound system comprehension. Producing code without a rigorous validation framework isn’t engineering; it’s simply mass-producing technical debt.

So, what adjustments are necessary?

Revisiting our approach to code

Firstly, as I recently argued, we must cease viewing code as an isolated asset. Each line of code represents a potential attack surface that demands security, monitoring, maintenance, and seamless integration with its surroundings. Therefore, making code creation cheaper doesn’t reduce the overall workload; instead, it amplifies it by increasing the amount of liability generated per hour.

For many years, developers were treated like highly compensated Jira ticket processors. The prevailing assumption was that you could take a clearly defined requirement, transform it into syntax, and deploy it. Crickett accurately points out that if this is the extent of your work, then your role is absolutely automatable. A machine is capable of basic translation and is perfectly content to perform it ceaselessly without complaint.

However, a machine lacks the ability to grasp crucial business context. AI cannot perceive the financial implications of a compliance error or examine a customer workflow and instinctively discern that the underlying requirement is fundamentally flawed. For these tasks, we require human insight, and we need people to thoughtfully consider precisely what they intend AI to accomplish.

Crickett characterizes this shift as a necessary progression towards spec-driven development. He is correct, but we must be exceptionally clear about what a “specification” entails in the agent era. It’s not merely another Jira ticket; instead, it’s a set of constraints sufficiently stringent to prevent an LLM from deviating. In essence, it’s an executable definition of “done,” fully supported by tests, API contracts, and strict production signals. This is precisely the kind of foundational work we have neglected for decades because it doesn’t appear as tangible output; it manifests as process. You know, the “unexciting stuff” that seems to slow you down.

The tensions are evident in real-time, simply by examining the responses to Crickett’s tweet. You’ll observe individuals desperately attempting to reconcile the complexities of agent-driven development. One commentator tries to reframe the disorder as architecture versus engineering. Another insists that overseeing 19 agents is actually orchestration, not context switching. A third bluntly states that managing more than five agents concurrently starts to resemble “vibe coding,” which is a polite way of describing gambling with live production systems. All these highlight the central issue: the work hasn’t been eliminated. It has merely been shifted from implementation to supervision and review.

The more you parallelize your code generation, the greater the “review debt” you accumulate.

Observability as the solution

This is where Charity Majors, co-founder and CTO of Honeycomb, expresses her frustration. Majors has argued for years that true understanding of code functionality only comes from running it in production, under actual load, with real users, and genuine failure conditions. With AI agents, the development burden entirely shifts from writing code to validating it. Humans are notoriously poor at validating code simply by reviewing large pull requests. We confirm system integrity by observing its behavior in live environments.

Now, extend that concept further into the age of AI agents. For decades, one of the most common debugging strategies was inherently social. A production alert triggers. You examine the version control history, identify the person who authored the code, inquire about their objectives, and reconstruct the architectural intent. But what becomes of that process when no human actually wrote the code? What happens when a human merely skimmed a 3,000-line, agent-generated pull request, clicked merge, and moved on to the next task? When an incident occurs, where is the profound knowledge that once resided with the author?

This is precisely why extensive observability isn’t just a beneficial feature in the agent era; it’s the sole viable replacement for the absent human element. In the age of AI agents, we require instrumentation that captures both intent and business outcomes, not merely generic logs indicating an event happened. We need distributed traces and high-cardinality events rich enough to precisely answer what changed, its impact, and why it failed. Without this, we are attempting to operate a black box constructed by another black box.

Majors also provides crucial operational guidance: deployment freezes are fundamentally a quick fix. The common human reaction when change seems risky is to halt deployments. However, if you continue merging agent-generated code without deploying it, you’re simply accumulating risk, not reducing it. When you finally execute a deployment, you’ll have absolutely no idea which specific AI hallucination just disrupted your payment gateway. Therefore, if you must freeze anything, freeze merges. Better yet, make the merge and the deployment feel like a single, indivisible action. The faster that cycle operates, the less variance you encounter, and the simpler it becomes to pinpoint the exact cause of a breakdown.

Golden paths are the solution

The remedy for this impending chaos isn’t to depend on individual heroic engineers. As Majors emphasizes, resilient engineering demands a commitment to platform engineering and golden paths (a stance I’ve also advocated). Such golden paths make proper behavior incredibly straightforward and incorrect behavior exceedingly difficult. The most effective teams of the next decade won’t be those with the most freedom to adopt any framework an agent suggests, but rather those that operate securely within the most effective constraints.

So, how do we assess success in the agentic era?

The essential metrics remain the unglamorous ones, because they measure actual business results. The DORA metrics continue to be our most reliable sanity check as they directly link delivery speed to system stability. They quantify deployment frequency, lead time for changes, change failure rate, and time to restore service. None of these metrics are concerned with the number of commits your agents generated today. They only care whether your system can absorb changes without failing.

Therefore, absolutely utilize coding agents. Employ them vigorously! But do not conflate code generation with productivity. Productivity emerges after code generation, when code is properly constrained, validated, observed, deployed, rolled back, and understood. This is the cornerstone of enterprise safety and developer productivity.

Software DevelopmentIT Skills and TrainingCode SecuritySecurityDevops
Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *