How to Validate AI-Generated Code: A Practical Checklist Before Merging

LLM-generated code can look flawless at first glance. It compiles, passes the linter, has descriptive names. And still breaks production three days later.

The problem isn't the AI — it's that most review workflows aren't calibrated for the types of errors LLMs make. This checklist closes that gap. Not theory: this is what you need to verify before approving a PR with generated code.

1. Tests passing — and tests that actually mean something

The first thing is obvious: tests need to pass. But there's a more important layer: do the passing tests actually cover the critical behavior?

LLMs tend to generate tests that validate the happy path and nothing else. Or worse: tests that "pass" because they're poorly written — empty assertions, mocks that don't verify what they should.

What to check:

Are there tests for business edge cases, not just the happy flow?
Do assertions verify real values or just that nothing threw an exception?
Are mocks configured correctly or do they just "not explode"?

If the tests look shallow, that's a warning sign — not a sign the code is fine.

2. Edge cases considered

The LLM generates code for the case you described. Not necessarily for the cases you didn't describe.

Before merging, actively think about the boundaries:

What happens with empty, null, or zero-length inputs?
What happens with extreme values (very large numbers, very long strings)?
What happens if an external dependency fails or returns an unexpected format?
What happens with concurrency if the code can run in parallel?

It's not enough for the code to "look robust." You need to explicitly think through the edge cases and verify they're handled.

3. Consistency with the rest of the repo

The LLM doesn't know your codebase. It generates correct code in the abstract, but it may be inconsistent with the conventions, patterns, and abstractions that already exist.

What to check:

Does it use the same abstractions as the rest of the code (repositories, services, helpers)?
Does it follow the project's naming conventions?
Does it duplicate logic that already exists elsewhere?
Does it introduce a new dependency you'd already solved another way?

A PR that works but is incoherent with the repo is technical debt from day one.

4. Security: the three that hurt the most

LLMs can introduce security vulnerabilities without any warning. Not because they're careless — but because they optimize for code that works, not code that resists attacks.

The three most common in generated code:

SQL injection: if the code builds queries with string interpolation instead of prepared parameters. LLMs sometimes do this when the ORM isn't in context or when the prompt describes the query directly.

XSS (Cross-Site Scripting): if the code renders user content in the DOM without sanitizing it, or uses innerHTML where textContent should be used.

Hardcoded secrets: the LLM may hardcode example values that look like placeholders but are actually real tokens you copied into the prompt. Check the diff for strings that look like keys, tokens, or passwords.

In any of these cases: don't merge until fixed.

5. Plausible performance

The LLM can generate correct code that's algorithmically inefficient. It doesn't always matter, but in critical paths it does.

What to check:

Are there O(n²) loops where the data volume would make that unacceptable?
Are there database or API calls inside a loop that should be a single batch call?
Are large data structures being loaded completely when only a subset is needed?

You don't need to micro-optimize everything. But you do need to identify whether anything will blow up with real data.

6. Explicit error handling

LLM-generated code tends to handle the happy path very well and leave error handling vague or incomplete.

Warning signs:

Empty catch (e) {} or one with just a console.log.
Errors that are silenced instead of propagated or handled.
Missing input validation before processing.
No timeouts on external service calls.

A silenced error in production can take days to detect. Verify that every failure point has an explicit response.

The checklist, summarized

Before approving any PR with AI-generated code:

Tests passing — with real assertions about behavior
Edge cases considered: nulls, empty inputs, extremes, concurrency
Consistent with the repo's abstractions and conventions
No SQL injection, XSS, or hardcoded secrets
Acceptable performance for real data volumes
Explicit error handling at every failure point

A note on trust

This checklist doesn't exist because AI is bad at generating code. It exists because the standard human code review workflow doesn't always catch the specific errors LLMs tend to make.

Over time, you'll develop an eye for the most common error patterns in code generated by your model of choice. Until then, the checklist is the safety net that makes the process repeatable and reliable.