Show sample answer ▾▴
A payment webhook intermittently dropped events, but only in production. I added structured logging with request IDs and discovered a race condition in our Celery worker pool under high concurrency. I made the handler idempotent with a Redis-backed lock and added a replay endpoint. Dropped-event reports went to zero and we recovered the lost transactions.