Article

Jun 9, 2026

AI-Generated Code and Technical Debt: What 211 Million Lines Show

Two hard datasets, one uncomfortable conclusion: AI is shipping code faster and rotting codebases faster. Here's what the measurements actually say

Two diverging light threads against deep black, one rising sharply, the other falling

TL;DR

  • GitClear's 2025 study of 211M changed lines: copy/pasted code rose from 8.3% to 12.3% between 2021 and 2024.

  • Refactored "moved" lines collapsed from ~24% in 2020 to 9.5% in 2024. Blocks of 5+ duplicated lines grew 8x in 2024 alone.

  • A METR randomized trial found experienced devs were 19% slower with AI tools while feeling 20% faster.

  • Stripe's Developer Coefficient: devs already burn 17.3 hours a week (~42%) on bad code and debt.

  • Speed without supervision is a loan. The interest is paid in Q3 by the team that didn't write the code.

The Short Answer

Yes, AI-generated code is creating technical debt — and the only two rigorous datasets we have both point the same direction. GitClear's analysis of 211 million changed code lines through 2024 shows duplication climbing and refactoring collapsing. A 2025 randomized controlled trial from METR found experienced developers were 19% slower on real tasks with AI assistance, even though they believed they were 20% faster. That's the headline. The rest of this piece is what those numbers actually mean for your P&L, and the supervision practices that keep the speed without the debt.

The ai generated code technical debt conversation is mostly vibes right now. Boosters wave benchmark scores. Doomers wave horror stories. We're going to anchor on the measurements.

1. The Promise vs the Measurements

The sales pitch for AI coding tools is straightforward: more output per developer-hour, lower cost per feature, faster time to revenue. In demos, that pitch lands. The line of code appears. The test passes. The PR opens itself.

In production codebases over multiple quarters, the picture is different. Two studies published in 2025 are the only sources we'd cite to a board:

  1. GitClear's AI Assistant Code Quality 2025 Research, covering 211 million changed lines across public and private repos.

  2. METR's randomized trial of experienced open-source developers, published July 2025.

Both are imperfect — GitClear measures correlation across a corpus, METR measures a specific population on specific tasks. Neither is the whole story. But together they're the only signal that isn't a vendor case study or a Twitter thread.

2. GitClear's Data: Duplication Up, Refactoring Down

The headline finding from GitClear is a scissors chart.

Between 2021 and 2024, the share of copy/pasted lines in commits rose from 8.3% to 12.3%. Over roughly the same window, the share of "moved" lines — code being refactored rather than rewritten — fell from about 24% in 2020 to 9.5% in 2024. Blocks of 5 or more duplicated lines grew 8x during 2024.


Bar chart comparing copy-pasted code lines rising versus refactored moved lines falling from 2020 to 2024

GitClear, 2025: copy/pasted lines climbed while moved/refactored lines collapsed across 211M changed lines.

Why this matters: moved code is the fingerprint of a developer who read the existing system and decided to reuse it. Copy/pasted code is the fingerprint of a developer (or an assistant) who decided it was cheaper to start over locally. The first behavior compounds value. The second compounds debt.

GitClear's data doesn't prove AI assistants cause the shift. Adoption and duplication are correlated, not causally linked in the study. But the timing is hard to ignore — the duplication curve bends sharply upward in the same years Copilot, Cursor, and Claude moved into mainstream workflows.

3. Why Copy-Paste Is a Loan With Compounding Interest

Here's the operator version of why duplication matters.

When a function is written once and reused 12 times, fixing a bug means changing one place. When the same function is pasted 12 times with minor variations, fixing a bug means finding 12 places, then deciding which variations were intentional and which were accidents. The cost of the fix is roughly linear in the number of copies. The cost of knowing you fixed it everywhere is worse than linear.

This is what people mean when they say ai code quality problems aren't really about the AI — they're about what the AI removes from the workflow. The pause where a developer searches the codebase, finds the existing helper, and reuses it. That pause is the entire refactoring discipline. Skip it and you ship faster this week, then pay for it every week after.

Stripe's Developer Coefficient report put a number on the baseline: developers were already spending about 42% of their week — 17.3 hours — on technical debt and bad code, an estimated $85B/year in global opportunity cost. That was before the duplication curve bent. If the GitClear trend holds, the 42% gets worse, not better.

4. The METR Surprise: 20% Faster Felt, 19% Slower Measured

METR ran a randomized controlled trial with experienced open-source developers on tasks in their own codebases. Half the tasks were done with AI assistance available, half without. The developers were timed.

The result, published in July 2025: developers using AI tools took 19% longer to complete tasks. The same developers, asked to estimate their speed afterward, believed they had been roughly 20% faster.

That 39-point gap between perceived and measured productivity is the most important number in the entire AI coding debate, and almost nobody is quoting it. It explains why every team you talk to says AI is making them faster while their release velocity hasn't moved. They are not lying. They feel faster. They are answering more prompts per hour, accepting more suggestions, watching more code appear. The feeling is real. The throughput, in this particular study, was not.

METR is careful about scope: experienced developers, open-source repos, specific task types. The result doesn't generalize to junior developers writing greenfield code, where the speedup is probably real. But for the population most teams care about — senior engineers working in a mature codebase — the question does ai code create technical debt has a second uncomfortable layer: it may also not be making them faster.

5. The Baseline Nobody Mentions

Before we get to fixes, the baseline matters. Stripe's Developer Coefficient found developers spend roughly 17.3 hours per week — about 42% — on technical debt, bad code, and maintenance. That study predates the current AI tooling wave.

So the honest framing is this: AI-assisted teams aren't introducing debt into a clean system. They're adding duplication on top of a codebase that was already eating 42% of the team's time. The METR result and the GitClear result are scary precisely because the starting point wasn't healthy.

If you're an operator, the math you care about is: net hours saved per week, after accounting for the future maintenance hours your team will spend on the duplicated code. Most teams we work with at Entropy have never run that calculation. They run the gross number ("PRs opened up 30%") and call it a win.

6. Supervision Practices That Keep the Speed

The takeaway isn't "turn off the AI." Used with supervision, these tools are genuinely useful for boilerplate, test scaffolding, well-known patterns, and getting a junior unstuck. The takeaway is that unsupervised AI coding behaves the way unsupervised anything behaves: it optimizes for the loop it can see (write code that compiles) and ignores the loop it can't (keep the codebase legible in 18 months).

In our client work building production systems, four practices keep the speed without the debt:

Review gates that flag duplication explicitly. Most code review focuses on logic and style. Add a pre-merge check that flags blocks of 5+ duplicated lines against the rest of the repo. GitClear and several open-source tools do this. The check doesn't have to block — it just has to make the duplication visible to the reviewer before the PR merges.

A refactor budget every sprint. Reserve a fixed percentage of engineering time — we typically suggest 15–20% — for moving code rather than writing new code. Track "moved lines" as a metric alongside "new lines." If moved lines trend toward zero, the team is accumulating debt faster than it's resolving it.

Tests-first, not tests-after. AI assistants are very good at writing code that passes a test that already exists. They are less reliable at writing code in a vacuum and then writing tests that catch their own mistakes. Inverting the order — human writes the test, AI writes the implementation — keeps the human in the loop on what "correct" means.

Per-PR cost-of-ownership note. A one-line field on every PR: who maintains this if it breaks at 2am six months from now? It sounds bureaucratic. It is. It also stops about 30% of the speculative "let's just paste this in" PRs we've seen on client teams, because the question forces the author to name a future cost.

None of this is novel. It's the discipline good engineering teams used in 2018. The AI tooling didn't remove the need for it. It made the need less visible.

If you're thinking through how this applies to a specific stack, our vibe-coded app production checklist walks through the gates we use, and our notes on AI MVP development cost cover how supervision time factors into early-stage budgets.

7. Five Questions to Ask Any Team Shipping AI-Written Code for You

If you're hiring an agency, a contractor, or evaluating an in-house team that ships AI-assisted code, ask these five questions. The answers tell you more than any portfolio.

  1. What's your duplication metric, and where did it move over the last two quarters? A team that can't answer isn't measuring. A team that says "it went down" is worth a follow-up.

  2. What percentage of PRs include a refactoring change versus pure additions? Healthy teams keep this above 20%. Below 10% and the codebase is accumulating debt at the rate GitClear measured.

  3. What's your test-writing order — human-first or AI-first? Either can work. "We don't really track it" cannot.

  4. Who reviews AI-generated code, and what are they specifically looking for? "Senior eng reviews everything" is the wrong answer at scale. You want named review gates, not heroic individuals.

  5. What does your team do when the AI suggests pasting an existing helper instead of importing it? The answer reveals whether the team has thought about duplication at all.

These aren't gotcha questions. They're the same questions a good CTO asks her own team. If you don't have a good CTO yet, they're the questions to ask instead.

FAQ

Does AI-generated code always create technical debt?

No. AI-generated code creates debt the same way human-generated code creates debt — when it's shipped without review, refactoring discipline, or tests. The GitClear data shows duplication trending up in aggregate, but individual teams with strong review gates can keep their codebases clean. The tool isn't the problem. The supervision around the tool is.

What's the most reliable signal that AI code is hurting my codebase?

Watch your duplicated-block count and your moved-lines ratio. If blocks of 5+ duplicated lines are growing quarter over quarter and the share of refactored ("moved") code is shrinking, you're following the GitClear trend. Most repos can be scanned for both in under an hour using open-source tooling.

Are AI coding tools actually making developers faster?

It depends on the developer and the task. METR's 2025 randomized trial found experienced open-source developers were 19% slower with AI assistance on real tasks in their own codebases, even though they felt 20% faster. Greenfield work and junior developers likely see real gains. Senior engineers in mature codebases may not.

How much technical debt do developers already deal with without AI?

Stripe's Developer Coefficient study estimated developers spend about 42% of their week — 17.3 hours — on bad code and maintenance, costing roughly $85B/year globally. That baseline existed before current AI tooling. Adding unsupervised AI code generation on top of an already-strained maintenance budget is what makes the GitClear duplication trend concerning.

What should we do this quarter if we already have a lot of AI-assisted code in production?

Run a duplication scan against your main repo and get a baseline number. Add a pre-merge check for blocks of 5+ duplicated lines. Reserve 15–20% of next sprint for refactoring the worst offenders. Don't try to fix everything at once. Get the metric visible, then trend it down over two quarters.

What to Do This Week

Run a duplication scan against your main repo on Monday. Pick the three worst offenders. Refactor one of them on Friday. That's the whole exercise. If the number trends down two quarters in a row, your AI tooling is working with you. If it trends up, you've found your bottleneck.

If you want a second pair of eyes on the scan or the review gates, we're here.

© All right reserved

© All right reserved