Version 2026-06 · CC-BY-4.0

AI Pre-Production Review Checklist

22 failure modes AI generators ship into production code, in 7 categories — with detection signals and mitigations.

Save this as a PDF: press Ctrl+P (Windows/Linux) or ⌘+P (Mac), then choose “Save as PDF”. The page is print-styled for clean output.

How to use this checklist. Run it across any AI-assisted codebase before production deploy. For each failure mode, walk through Detection on the actual code, not the spec. If Detection finds a hit, apply the Prevention pattern before launch — not as a follow-up. The catalog is compiled from 150+ engagements where AI-generated code passed initial review and then failed under production conditions. Each mode has been observed in at least three independent client engagements before inclusion.

CAT-CONCUR

Concurrency

Failures that emerge under simultaneous operations the AI did not model.

FM-01

Optimistic locking absent

AI Generates: Read-modify-write code without version check or row-level lock. Passes single-threaded test.
Production Failure: Two concurrent writes overwrite each other silently. Lost update problem at moderate concurrency.
Detection: Code review for any read-modify-write on shared records. Load test with 10+ concurrent users on the same entity.
Prevention: Optimistic concurrency token (rowversion / ETag) or explicit pessimistic lock with clear timeout.

CAT-DATA

Data Integrity

Failures that corrupt or lose data under conditions outside happy-path.

FM-03

Missing transaction boundaries

AI Generates: Multi-step write sequence (order, payment, inventory) without explicit transaction.
Production Failure: Partial commit on crash mid-sequence leaves inconsistent state. Orders without inventory decrement.
Detection: Trace every multi-write operation. If failure between steps leaves bad state, transaction is missing.
Prevention: Explicit transaction scope around logical units. Outbox pattern for cross-service writes.

FM-12

Decimal precision loss in money math

AI Generates: Float / double for monetary amounts. Looks like a number type.
Production Failure: Cents disappear or appear over time. Reconciliation drift. Audit fail.
Detection: Any money field that is not Decimal / fixed-point is wrong. Code review rule.
Prevention: Decimal type everywhere for money. Database column with explicit precision. Unit-tested edge cases.

FM-13

Timezone-unaware date handling

AI Generates: DateTime stored without timezone; client converts in JS arbitrarily.
Production Failure: Reports off by hours. Daily-rollup tasks miss data near midnight. Audit trail wrong.
Detection: Every datetime field — confirm UTC storage; every display — confirm explicit timezone conversion.
Prevention: Store UTC. Display in user-local at the edge. Never compare naive datetimes.

FM-14

Cache without invalidation

AI Generates: Lookup-cache around a slow query. No invalidation logic on the writer side.
Production Failure: Stale data served to users hours after the change. “Why does the dashboard not update?”
Detection: Every cache must have a documented invalidation trigger. If not — flag.
Prevention: Bounded TTL + explicit invalidation on write. Stale-while-revalidate where freshness is loose.

FM-17

Schema migration without backfill

AI Generates: ALTER TABLE adding a non-null column with a default but no backfill plan for existing rows.
Production Failure: Long-running migration locks production table for hours. Or worse: silent constraint break.
Detection: Every schema migration on a non-trivial table needs a backfill plan reviewed in advance.
Prevention: Expand-contract pattern. Nullable column first, backfill, then enforce non-null.

FM-22

Time-window vulnerability in promotional code

AI Generates: Coupon redemption that checks remaining count then decrements in a separate step.
Production Failure: Two concurrent redemptions both see remaining=1. Both succeed. Inventory underflows.
Detection: Atomic check-and-decrement; otherwise race exists.
Prevention: Atomic database operation (UPDATE ... WHERE remaining > 0). Or distributed lock with timeout.

CAT-LOAD

Behavior Under Load

Failures that manifest only at scale the AI did not test against.

FM-02

N+1 query pattern

AI Generates: ORM access in a loop that issues a separate query per item. Looks idiomatic.
Production Failure: Page load goes from 50ms to 5+ seconds as the collection grows. Database CPU spikes.
Detection: Profile real queries on a representative dataset. Watch for query count proportional to result size.
Prevention: Eager loading / join / batched query. Set a query-count budget per endpoint and assert in tests.

FM-05

Unbounded resource allocation

AI Generates: List = readAll(); foreach item ... . No pagination, no cap.
Production Failure: Memory exhausted when dataset grows. OutOfMemoryException at 50K rows.
Detection: Identify every readAll-style call. Confirm result size is bounded by request or by paging.
Prevention: Streaming or pagination by default. Reject readAll on unbounded sources at code review.

FM-18

Sync work inside HTTP handler

AI Generates: Endpoint that does an external API call inline before responding.
Production Failure: p99 latency tracks the slowest vendor. Cascading failure when vendor slows.
Detection: Anything in a sync handler taking >100ms is a candidate for async / queue.
Prevention: Background job + status endpoint for slow work. Async pipeline for non-critical-path work.

CAT-SECURE

Security

Vulnerabilities the AI introduced because it does not threat-model your specific surface.

FM-07

SQL injection via string concatenation

AI Generates: Dynamic SQL with concatenated user input when parameterized query was awkward.
Production Failure: Trivial SQL injection. Data exfiltration or destruction by a malicious or fuzzed input.
Detection: Static analysis flag on string + sql. Code review for every dynamic query.
Prevention: Parameterized queries by default. Lint rule that flags string concatenation in query construction.

FM-08

Authorization missing inside data access

AI Generates: Endpoint authenticates the user but the data query does not filter by ownership.
Production Failure: Authenticated user retrieves another tenant's records by guessing IDs. Cross-tenant data leak.
Detection: Every multi-tenant read must filter by tenant key in the query. Test with two users + IDOR probe.
Prevention: Row-level security in the database OR a tenant filter helper that wraps every query.

FM-09

Token / API key in code

AI Generates: Hardcoded secret committed to repo while wiring an integration.
Production Failure: Public repo leaks key. Credential rotation required. Sometimes followed by bill shock.
Detection: Pre-commit hook scanning for high-entropy strings; periodic secret scan over history.
Prevention: Secret manager. Never accept a string literal that looks like a key in code review.

FM-20

Prompt injection through retrieved content

AI Generates: RAG handler treats retrieved chunks as trusted instructions to the LLM.
Production Failure: Malicious document poisons the response. AI assistant exfiltrates context or executes a tool unsafely.
Detection: Threat-model the corpus. If any document can be authored by untrusted parties, retrieval is an injection vector.
Prevention: Retrieved content as data, never as instruction. System prompt that forbids following retrieved instructions. Output validation.

CAT-COST

Cost at Scale

Patterns that are cheap at prototype scale and become unaffordable in production.

FM-10

Unbounded LLM context cost

AI Generates: RAG retrieval that always sends top-50 chunks to the LLM regardless of relevance.
Production Failure: Bill is 10-50x what was planned because most tokens are noise. Latency degrades too.
Detection: Measure tokens-per-query vs answered-with-citations rate. Anomalies in either are signal.
Prevention: Reranker before LLM, confidence threshold, top-k tuned per use case. Budget alarm on token spend.

FM-19

Missing rate limit on AI endpoint

AI Generates: Public AI endpoint with no per-user / per-IP throttle.
Production Failure: Abuse runs up the LLM bill. Single bad actor can exceed a month of budget in an hour.
Detection: Every LLM-backed endpoint must have a per-key rate limit. Alarm above threshold.
Prevention: Rate limiter with budget alerting. Tiered quotas. Authenticated-only AI endpoints by default.

CAT-RECOVER

Recovery and Failure Modes

Code that has no plan for partial failure, retry, or rollback.

FM-04

Idempotency missing on retry path

AI Generates: Webhook handler or job runner that processes message once, no dedupe.
Production Failure: Network retry sends duplicate event. Customer is charged twice, email sent twice.
Detection: Ask: “if this runs twice with the same input, what happens?” Verify dedupe key exists.
Prevention: Idempotency key on every external-side-effect operation. Persist seen-keys for retention window.

FM-06

Timeout-less external call

AI Generates: HttpClient.GetAsync(url) with no timeout. Looks clean.
Production Failure: Vendor outage hangs every dependent request. Thread pool exhausted; whole service down.
Detection: Grep for HTTP clients, message queues, DB calls without explicit timeouts.
Prevention: Explicit timeout on every IO call. Circuit breaker for repeated failures. Bulkhead pool isolation.

FM-11

Missing dead-letter handling

AI Generates: Message handler that retries on failure forever.
Production Failure: Poison message blocks the queue. Backlog grows; processing freezes.
Detection: Every retry policy needs a give-up condition and a destination for the give-up.
Prevention: Bounded retries, dead-letter queue, alerting on dead-letter count. Manual review path.

FM-21

No rollback path for AI feature

AI Generates: Replaces a deterministic computation with an LLM call. No fallback.
Production Failure: LLM vendor outage takes the feature down. There is no degraded mode.
Detection: For every AI feature: what does the system do when the AI is unavailable? If “nothing” — fix.
Prevention: Feature flag + deterministic fallback. Graceful degradation. Health probes on the AI dependency.

CAT-EVOLVE

Long-Term Evolution

Code that is correct now and will be a refactoring blocker in 18 months.

FM-15

Cross-cutting logging coupled to business code

AI Generates: Log lines threaded through business methods, mixed with returns.
Production Failure: Refactoring drops critical observability silently. Incident response degrades.
Detection: Audit logging to confirm it is a cross-cutting concern, not inline copy-paste.
Prevention: Structured logging via middleware / aspect. Logging contract per layer, enforced in review.

FM-16

Tight coupling to LLM vendor

AI Generates: Direct vendor SDK calls scattered through business code.
Production Failure: Vendor price hike or deprecation forces touching every call site. Migration is a quarter.
Detection: Grep for vendor SDK names. If they appear in business code, abstraction is missing.
Prevention: LLM gateway with versioned prompts and a stable internal interface. Vendor swap = one config change.

Coverage Summary

Concurrency1

Data Integrity6

Behavior Under Load3

Security4

Cost at Scale2

Recovery and Failure Modes4

Long-Term Evolution2

Total22

About this catalog. Compiled from 150+ Slavin AI & SLAtech engagements 2022–2026 where AI-assisted code passed initial review and then failed under production conditions. Each failure mode has been observed in at least three independent client engagements before inclusion. Entries are documented as patterns, not as specific client incidents.

License. Creative Commons Attribution 4.0 International (CC-BY-4.0). Reuse with credit to “Slavin AI” and a link to slavin.ai/Checklist/AI-Pre-Production-Review.

Machine-readable version. The same catalog as JSON: slavin.ai/data/ai-failure-modes-catalog.json.

Read the position page →

Slavin AI is a brand of SLAtech LTD. slatech.co.il · Contact