I Let AI Review My Pull Requests for a Month: The Results Were Uncomfortable
Using AI to review code before clients or collaborators see it. The bugs it caught, the false positives that wasted time, and whether it's worth doing.
I've been writing code professionally for over a decade. I'm supposed to be the person who catches bugs, not the one making rookie mistakes. So when I decided to run every pull request through Claude for a month before submitting them to clients or collaborators, I expected to feel validated.
Instead, I felt exposed.
This isn't an article about how amazing AI code review is. It's about what happens when you let a tireless, non-judgmental reviewer look at your work every single day—and what that reveals about your habits, assumptions, and blind spots.
The Experiment Setup
For 30 days (October 2024), I followed one rule: before marking any PR as "ready for review," I'd paste the diff into Claude and ask for a code review. Not a cursory glance—a proper review covering:
- Potential bugs and edge cases
- Performance concerns
- Security issues
- Code clarity and maintainability
- Missing tests
I used Claude 3.5 Sonnet via the API, which cost about £12 for the month across 47 pull requests. The reviews took 2-5 minutes each.
Here's what I learned.
The Bugs I Missed (And Should Have Caught)
Week 1: The Race Condition I Wrote Twice
I was adding real-time features to a React Native app we're building. The code looked fine:
const handleWebSocketMessage = async (message: Message) => {
const data = JSON.parse(message.data);
await updateLocalCache(data);
setMessages(prev => [...prev, data]);
};Claude's feedback:
"Race condition:
updateLocalCacheis async but you're updating state immediately after. IfupdateLocalCachetakes 100ms and another message arrives in 50ms, the second message will update state before the first cache update completes, potentially causing data inconsistency."
I stared at this for a full minute. I'd written this exact bug six months ago in a different project. We'd spent half a day debugging it in production.
Fix:
const handleWebSocketMessage = async (message: Message) => {
const data = JSON.parse(message.data);
const cached = await updateLocalCache(data);
setMessages(prev => [...prev, cached]);
};That single catch probably paid for the entire month of API costs.
Week 2: The TypeScript Type That Lied
I was adding a feature flag system. The types looked solid:
type FeatureFlag = {
enabled: boolean;
rolloutPercentage: number;
};
const checkFlag = (flag: FeatureFlag): boolean => {
return flag.enabled && Math.random() * 100 < flag.rolloutPercentage;
};Claude pointed out:
"If
enabledisfalse, you're still rolling the dice. This means a disabled feature withrolloutPercentage: 100behaves the same as an enabled feature withrolloutPercentage: 0once the random number is generated. Consider returning early if!enabled."
Technically, the code worked. But Claude was right—it was confusing. Anyone reading this would have to mentally parse the entire condition to understand behavior.
More importantly: I'd missed that enabled: false, rolloutPercentage: 100 was doing unnecessary computation on every check.
Fix:
const checkFlag = (flag: FeatureFlag): boolean => {
if (!flag.enabled) return false;
return Math.random() * 100 < flag.rolloutPercentage;
};Week 3: The Security Hole That Wasn't Obvious
This one stung. I was building an API endpoint for our web app:
app.post('/api/user/update', async (req, res) => {
const userId = req.session.userId;
const updates = req.body;
await db.users.update(userId, updates);
res.json({ success: true });
});Claude's review:
"This allows clients to update any field in the user record by including it in the request body. Can a user set their own
isAdminflag? Consider explicitly whitelisting allowed fields."
I actually said "oh shit" out loud.
I'd written this assuming our frontend would only send valid fields. But APIs don't work that way. Anyone with curl could send { "isAdmin": true }.
This was a production vulnerability waiting to happen.
Fix:
const ALLOWED_FIELDS = ['name', 'email', 'preferences'] as const;
app.post('/api/user/update', async (req, res) => {
const userId = req.session.userId;
const updates = pick(req.body, ALLOWED_FIELDS);
await db.users.update(userId, updates);
res.json({ success: true });
});The False Positives (And What They Taught Me)
Not every AI suggestion was gold. Roughly 30% of Claude's recommendations fell into three categories:
1. Over-Engineering Simple Code
Claude suggested extracting a three-line conditional into a separate function "for clarity." It wasn't clearer—it was just more code.
I learned to ask: "Would this actually help the next person reading this?"
2. Performance Optimizations That Didn't Matter
Claude flagged a .map() that could be .forEach() to "avoid creating an intermediate array."
We were processing 12 items. The performance difference was unmeasurable. I ignored it.
3. Suggestions That Didn't Understand Context
Claude recommended adding error handling to a development-only script. Fair point in isolation, but this script ran once during setup and threw obvious errors if it failed.
I learned to distinguish between "technically correct" and "actually useful."
The Uncomfortable Patterns I Discovered
After a month, patterns emerged. These were the hardest to see:
I Skip Error Handling on Fridays
No joke. I reviewed my PRs by day of week. Friday PRs had 2.5x more missing error handling than Monday PRs.
I was rushing to finish features before the weekend. AI didn't care what day it was.
I Write Worse Code After Meetings
PRs created within 2 hours of a client call had more bugs. I was context-switching poorly and not letting my brain settle before coding.
I Cargo-Cult My Own Patterns
I kept writing try/catch blocks that just logged errors and continued—a pattern I'd copy-pasted from an old project. Claude kept asking: "Should this be rethrowing? Should the user see an error?"
I'd been doing it wrong for months.
Should Junior Developers Use AI Code Review?
This is the question everyone asks. After a month, my answer is: yes, but with training wheels.
Do Use AI If You:
- Review the suggestions critically (don't blindly accept)
- Treat it as a learning tool, not a crutch
- Still get human code reviews afterward
- Want to catch obvious mistakes before submitting code
Don't Use AI If You:
- Accept every suggestion without understanding why
- Skip learning fundamentals because "AI will catch it"
- Use it to avoid asking questions
- Think it replaces understanding your code
AI code review works best as a learning tool—helping you understand why certain patterns are better, not just auto-fixing problems.
The Quality Shift
Here's something I didn't expect: collaborators and freelancers I work with noticed a quality improvement in my PRs.
"You're catching more stuff upfront," one collaborator said. "It's making reviews faster."
This created a positive feedback loop. Better initial code meant:
- Less back-and-forth
- More time for architectural discussions
- Fewer obvious bugs
But there was a downside: I started to rely on it. On day 28, I forgot to run Claude on a PR. A collaborator caught three bugs I would have normally spotted myself.
The tool had become a crutch.
What I'm Keeping (And What I'm Dropping)
After 30 days, here's my new workflow:
Still Using AI For:
- Complex logic I wrote late in the day
- Code touching authentication or security
- PRs I'm nervous about (trust your gut)
- Features with weird edge cases
Not Using AI For:
- Simple refactors or style changes
- Code I pair-programmed (already reviewed)
- Fixes that only change configuration
- Tiny PRs (1-10 lines)
The goal: use AI like you'd use a linter—helpful automation for mechanical checks, not a replacement for thinking.
The Bottom Line
AI code review caught bugs I should have caught myself. That's uncomfortable to admit. But it also made me a better developer by:
- Revealing my blind spots (Friday afternoon code)
- Teaching me new patterns (better error handling)
- Making me think harder about edge cases
- Catching mistakes before submitting to collaborators
The cost? £12 and about 90 minutes of my time across a month. The return? Probably 4-5 hours saved in review cycles and at least one production bug prevented.
Would I recommend it? Yes, with one caveat: don't let it replace understanding your code. Use it as a mirror that shows you what you missed, then learn from that.
Because the uncomfortable truth is: we all write bugs. The question is whether you want to catch them yourself or wait for production to find them.
Experimenting with code quality processes? I've been testing AI-assisted development workflows across client projects. If you're curious about implementing similar processes—or just want to talk about modern development practices—get in touch.