I Let AI Review My Pull Requests for a Month: The Results Were Uncomfortable

I've been writing code professionally for over a decade. I'm supposed to be the person who catches bugs, not the one making rookie mistakes. So when I decided to run every pull request through Claude for a month before submitting them to clients or collaborators, I expected to feel validated.

Instead, I felt exposed.

This isn't an article about how amazing AI code review is. It's about what happens when you let a tireless, non-judgmental reviewer look at your work every single day—and what that reveals about your habits, assumptions, and blind spots.

The Experiment Setup

For 30 days (October 2024), I followed one rule: before marking any PR as "ready for review," I'd paste the diff into Claude and ask for a code review. Not a cursory glance—a proper review covering:

Potential bugs and edge cases
Performance concerns
Security issues
Code clarity and maintainability
Missing tests

I used Claude 3.5 Sonnet via the API, which cost about £12 for the month across 47 pull requests. The reviews took 2-5 minutes each.

Here's what I learned.

The Bugs I Missed (And Should Have Caught)

Week 1: The Race Condition I Wrote Twice

I was adding real-time features to a React Native app we're building. The code looked fine:

const handleWebSocketMessage = async (message: Message) => {
  const data = JSON.parse(message.data);
  await updateLocalCache(data);
  setMessages(prev => [...prev, data]);
};

Claude's feedback:

"Race condition: updateLocalCache is async but you're updating state immediately after. If updateLocalCache takes 100ms and another message arrives in 50ms, the second message will update state before the first cache update completes, potentially causing data inconsistency."

I stared at this for a full minute. I'd written this exact bug six months ago in a different project. We'd spent half a day debugging it in production.

Fix:

const handleWebSocketMessage = async (message: Message) => {
  const data = JSON.parse(message.data);
  const cached = await updateLocalCache(data);
  setMessages(prev => [...prev, cached]);
};

That single catch probably paid for the entire month of API costs.

Week 2: The TypeScript Type That Lied

I was adding a feature flag system. The types looked solid:

type FeatureFlag = {
  enabled: boolean;
  rolloutPercentage: number;
};
 
const checkFlag = (flag: FeatureFlag): boolean => {
  return flag.enabled && Math.random() * 100 < flag.rolloutPercentage;
};

Claude pointed out:

"If enabled is false, you're still rolling the dice. This means a disabled feature with rolloutPercentage: 100 behaves the same as an enabled feature with rolloutPercentage: 0 once the random number is generated. Consider returning early if !enabled."

Technically, the code worked. But Claude was right—it was confusing. Anyone reading this would have to mentally parse the entire condition to understand behavior.

More importantly: I'd missed that enabled: false, rolloutPercentage: 100 was doing unnecessary computation on every check.

Fix:

const checkFlag = (flag: FeatureFlag): boolean => {
  if (!flag.enabled) return false;
  return Math.random() * 100 < flag.rolloutPercentage;
};

Week 3: The Security Hole That Wasn't Obvious

This one stung. I was building an API endpoint for our web app:

app.post('/api/user/update', async (req, res) => {
  const userId = req.session.userId;
  const updates = req.body;
  await db.users.update(userId, updates);
  res.json({ success: true });
});

Claude's review:

"This allows clients to update any field in the user record by including it in the request body. Can a user set their own isAdmin flag? Consider explicitly whitelisting allowed fields."

I actually said "oh shit" out loud.

I'd written this assuming our frontend would only send valid fields. But APIs don't work that way. Anyone with curl could send { "isAdmin": true }.

This was a production vulnerability waiting to happen.

Fix:

const ALLOWED_FIELDS = ['name', 'email', 'preferences'] as const;
 
app.post('/api/user/update', async (req, res) => {
  const userId = req.session.userId;
  const updates = pick(req.body, ALLOWED_FIELDS);
  await db.users.update(userId, updates);
  res.json({ success: true });
});

The False Positives (And What They Taught Me)

Not every AI suggestion was gold. Roughly 30% of Claude's recommendations fell into three categories:

1. Over-Engineering Simple Code

Claude suggested extracting a three-line conditional into a separate function "for clarity." It wasn't clearer—it was just more code.

I learned to ask: "Would this actually help the next person reading this?"

2. Performance Optimizations That Didn't Matter

Claude flagged a .map() that could be .forEach() to "avoid creating an intermediate array."

We were processing 12 items. The performance difference was unmeasurable. I ignored it.

3. Suggestions That Didn't Understand Context

Claude recommended adding error handling to a development-only script. Fair point in isolation, but this script ran once during setup and threw obvious errors if it failed.

I learned to distinguish between "technically correct" and "actually useful."

The Uncomfortable Patterns I Discovered

After a month, patterns emerged. These were the hardest to see:

I Skip Error Handling on Fridays

No joke. I reviewed my PRs by day of week. Friday PRs had 2.5x more missing error handling than Monday PRs.

I was rushing to finish features before the weekend. AI didn't care what day it was.

I Write Worse Code After Meetings

PRs created within 2 hours of a client call had more bugs. I was context-switching poorly and not letting my brain settle before coding.

I Cargo-Cult My Own Patterns

I kept writing try/catch blocks that just logged errors and continued—a pattern I'd copy-pasted from an old project. Claude kept asking: "Should this be rethrowing? Should the user see an error?"

I'd been doing it wrong for months.

Should Junior Developers Use AI Code Review?

This is the question everyone asks. After a month, my answer is: yes, but with training wheels.

Do Use AI If You:

Review the suggestions critically (don't blindly accept)
Treat it as a learning tool, not a crutch
Still get human code reviews afterward
Want to catch obvious mistakes before submitting code

Don't Use AI If You:

Accept every suggestion without understanding why
Skip learning fundamentals because "AI will catch it"
Use it to avoid asking questions
Think it replaces understanding your code

AI code review works best as a learning tool—helping you understand why certain patterns are better, not just auto-fixing problems.

The Quality Shift

Here's something I didn't expect: collaborators and freelancers I work with noticed a quality improvement in my PRs.

"You're catching more stuff upfront," one collaborator said. "It's making reviews faster."

This created a positive feedback loop. Better initial code meant:

Less back-and-forth
More time for architectural discussions
Fewer obvious bugs

But there was a downside: I started to rely on it. On day 28, I forgot to run Claude on a PR. A collaborator caught three bugs I would have normally spotted myself.

The tool had become a crutch.

What I'm Keeping (And What I'm Dropping)

After 30 days, here's my new workflow:

Still Using AI For:

Complex logic I wrote late in the day
Code touching authentication or security
PRs I'm nervous about (trust your gut)
Features with weird edge cases

Not Using AI For:

Simple refactors or style changes
Code I pair-programmed (already reviewed)
Fixes that only change configuration
Tiny PRs (1-10 lines)

The goal: use AI like you'd use a linter—helpful automation for mechanical checks, not a replacement for thinking.

The Bottom Line

AI code review caught bugs I should have caught myself. That's uncomfortable to admit. But it also made me a better developer by:

Revealing my blind spots (Friday afternoon code)
Teaching me new patterns (better error handling)
Making me think harder about edge cases
Catching mistakes before submitting to collaborators

The cost? £12 and about 90 minutes of my time across a month. The return? Probably 4-5 hours saved in review cycles and at least one production bug prevented.

Would I recommend it? Yes, with one caveat: don't let it replace understanding your code. Use it as a mirror that shows you what you missed, then learn from that.

Because the uncomfortable truth is: we all write bugs. The question is whether you want to catch them yourself or wait for production to find them.

Experimenting with code quality processes? I've been testing AI-assisted development workflows across client projects. If you're curious about implementing similar processes—or just want to talk about modern development practices—get in touch.

The Experiment Setup

The Bugs I Missed (And Should Have Caught)

Week 1: The Race Condition I Wrote Twice

Week 2: The TypeScript Type That Lied

Week 3: The Security Hole That Wasn't Obvious

The False Positives (And What They Taught Me)

1. Over-Engineering Simple Code

2. Performance Optimizations That Didn't Matter

3. Suggestions That Didn't Understand Context

The Uncomfortable Patterns I Discovered

I Skip Error Handling on Fridays

I Write Worse Code After Meetings

I Cargo-Cult My Own Patterns

Should Junior Developers Use AI Code Review?

Do Use AI If You:

Don't Use AI If You:

The Quality Shift

What I'm Keeping (And What I'm Dropping)

Still Using AI For:

Not Using AI For:

The Bottom Line

AMILLIONMONKEYS