Go back

Article

Jun 10, 2026

AI-Generated Code and Technical Debt: What 211 Million Lines Show

Two hard datasets, one uncomfortable conclusion: AI is shipping code faster and rotting codebases faster. Here's what the measurements actually say

Two diverging light threads against deep black, one rising sharply, the other falling

TL;DR

GitClear's 2025 study of 211M changed lines: copy/pasted code rose from 8.3% to 12.3% between 2021 and 2024.
Refactored "moved" lines collapsed from ~24% in 2020 to 9.5% in 2024. Blocks of 5+ duplicated lines grew 8x in 2024 alone.
A METR randomized trial found experienced devs were 19% slower with AI tools while feeling 20% faster.
Stripe's Developer Coefficient: devs already burn 17.3 hours a week (~42%) on bad code and debt.
Speed without supervision is a loan. The interest is paid in Q3 by the team that didn't write the code.

The Short Answer

Yes, AI-generated code is creating technical debt, and the only two rigorous datasets we have both point the same direction. GitClear's analysis of 211 million changed code lines through 2024 shows duplication climbing and refactoring collapsing. A 2025 randomized controlled trial from METR found experienced developers were 19% slower on real tasks with AI assistance, even though they believed they were 20% faster. That's the headline. The rest of this piece is what those numbers actually mean for your P&L, and the supervision practices that keep the speed without the debt.

The ai generated code technical debt conversation is mostly vibes right now. Boosters wave benchmark scores. Doomers wave horror stories. We're going to anchor on the measurements.

1. The Promise vs the Measurements

The sales pitch for AI coding tools is straightforward: more output per developer-hour, lower cost per feature, faster time to revenue. In demos, that pitch lands. The line of code appears. The test passes. The PR opens itself.

In production codebases over multiple quarters, the picture is different. Two studies published in 2025 are the only sources we'd cite to a board:

GitClear's AI Assistant Code Quality 2025 Research, covering 211 million changed lines across public and private repos.
METR's randomized trial of experienced open-source developers, published July 2025.

2. GitClear's Data: Duplication Up, Refactoring Down

The headline finding from GitClear is a scissors chart.

Between 2021 and 2024, the share of copy/pasted lines in commits rose from 8.3% to 12.3%. Over roughly the same window, the share of "moved" lines, code being refactored rather than rewritten, fell from about 24% in 2020 to 9.5% in 2024. Blocks of 5 or more duplicated lines grew 8x during 2024.

3. Why Copy-Paste Is a Loan With Compounding Interest

Here's the operator version of why duplication matters.

When a function is written once and reused 12 times, fixing a bug means changing one place. When the same function is pasted 12 times with minor variations, fixing a bug means finding 12 places, then deciding which variations were intentional and which were accidents. The cost of the fix is roughly linear in the number of copies. The cost of knowing you fixed it everywhere is worse than linear.

Stripe's Developer Coefficient report put a number on the baseline: developers were already spending about 42% of their week, 17.3 hours, on technical debt and bad code, an estimated $85B/year in global opportunity cost. That was before the duplication curve bent. If the GitClear trend holds, the 42% gets worse, not better.

4. The METR Surprise: 20% Faster Felt, 19% Slower Measured

METR ran a randomized controlled trial with experienced open-source developers on tasks in their own codebases. Half the tasks were done with AI assistance available, half without. The developers were timed.

The result, published in July 2025: developers using AI tools took 19% longer to complete tasks. The same developers, asked to estimate their speed afterward, believed they had been roughly 20% faster.

That 39-point gap between perceived and measured productivity is the most important number in the entire AI coding debate, and almost nobody is quoting it. It explains why every team you talk to says AI is making them faster while their release velocity hasn't moved. They are not lying. They feel faster. They are answering more prompts per hour, accepting more suggestions, watching more code appear. The feeling is real. The throughput, in this particular study, was not.

5. The Baseline Nobody Mentions

Before we get to fixes, the baseline matters. Stripe's Developer Coefficient found developers spend roughly 17.3 hours per week, about 42%, on technical debt, bad code, and maintenance. That study predates the current AI tooling wave.

So the honest framing is this: AI-assisted teams aren't introducing debt into a clean system. They're adding duplication on top of a codebase that was already eating 42% of the team's time. The METR result and the GitClear result are scary precisely because the starting point wasn't healthy.

If you're an operator, the math you care about is: net hours saved per week, after accounting for the future maintenance hours your team will spend on the duplicated code. Most teams we work with at Entropy have never run that calculation. They run the gross number ("PRs opened up 30%") and call it a win.

6. Supervision Practices That Keep the Speed

The takeaway isn't "turn off the AI." Used with supervision, these tools are genuinely useful for boilerplate, test scaffolding, well-known patterns, and getting a junior unstuck. The takeaway is that unsupervised AI coding behaves the way unsupervised anything behaves: it optimizes for the loop it can see (write code that compiles) and ignores the loop it can't (keep the codebase legible in 18 months).

In our client work building production systems, four practices keep the speed without the debt:

If you're thinking through how this applies to a specific stack, our vibe-coded app production checklist walks through the gates we use, and our notes on AI MVP development cost cover how supervision time factors into early-stage budgets.

FAQ

Does AI-generated code always create technical debt?

No. AI-generated code creates debt the same way human-generated code creates debt, when it's shipped without review, refactoring discipline, or tests. The GitClear data shows duplication trending up in aggregate, but individual teams with strong review gates can keep their codebases clean. The tool isn't the problem. The supervision around the tool is.

What's the most reliable signal that AI code is hurting my codebase?

Watch your duplicated-block count and your moved-lines ratio. If blocks of 5+ duplicated lines are growing quarter over quarter and the share of refactored ("moved") code is shrinking, you're following the GitClear trend. Most repos can be scanned for both in under an hour using open-source tooling.

Are AI coding tools actually making developers faster?

It depends on the developer and the task. METR's 2025 randomized trial found experienced open-source developers were 19% slower with AI assistance on real tasks in their own codebases, even though they felt 20% faster. Greenfield work and junior developers likely see real gains. Senior engineers in mature codebases may not.

How much technical debt do developers already deal with without AI?

Stripe's Developer Coefficient study estimated developers spend about 42% of their week, 17.3 hours, on bad code and maintenance, costing roughly $85B/year globally. That baseline existed before current AI tooling. Adding unsupervised AI code generation on top of an already-strained maintenance budget is what makes the GitClear duplication trend concerning.

What should we do this quarter if we already have a lot of AI-assisted code in production?

Run a duplication scan against your main repo and get a baseline number. Add a pre-merge check for blocks of 5+ duplicated lines. Reserve 15–20% of next sprint for refactoring the worst offenders. Don't try to fix everything at once. Get the metric visible, then trend it down over two quarters.

How should a small team prioritize capital one agentic ai auto sales?

Start with the workflow that already has a baseline: hours, leads, errors, or budget waste.

What should be measured before investing in capital one agentic ai auto sales?

Measure cycle time, volume, handoffs, error rate, and the current owner.

When should ai generated code technical debt stay manual instead of automated?

Keep it manual when judgment, approval, brand nuance, or customer trust is on the line.

How does best low-code ai agent builders comparison benefits change the budget for capital one agentic ai auto sales?

best low-code ai agent builders comparison benefits usually adds integration, QA, and monitoring work.

What is the first project to launch from this ai generated code technical debt playbook?

Launch the narrowest workflow with a visible result.

What to Do This Week

Run a duplication scan against your main repo on Monday. Pick the three worst offenders. Refactor one of them on Friday. That's the whole exercise. If the number trends down two quarters in a row, your AI tooling is working with you. If it trends up, you've found your bottleneck.

If you want a second pair of eyes on the scan or the review gates, we're here.