The other day I came across this article. I was immediately interested in it since we are designing a system that needs idempotency and didn’t have anything in place.
The nice thing about Airbnb’s design was that they generalized the idempotency with a three-phase process. Pre-RPC, RPC, and Post-RPC. In the first phase, they record and get the request with its idempotency key. In the second phase, they make the request to the external services, and they do not make any database writes at this moment. In the final step, they record the response on the database or handle the exception.
I recommend you read the full article, but the most beautiful part of their solution is they generalized in the form of a library to avoid latency issues and to have consistency across teams. Other than that, some concepts I found interesting from their solution:
- Retryable and non-retryable exceptions: consumers should deliberately declare which type of exceptions the service should handle.
- The client should be wise: an idempotency key should be used, the request can’t change across retries, and a good retry strategy needs to be implemented—for instance, exponential backoff or randomized wait times.
- Good idempotency key: this key can be from the request level or entity level.
- The API request should have an expiring lease for the keys: this helps prevent multiple event fires and other race conditions. Ideally, this lease should be a row-level lock.
- Record the response: the idempotent behavior can be monitored and guaranteed.
Finally, this eventual consistency model led me to explore some conflict resolution techniques that, albeit somewhat familiar to me in their implementations, I’ve never put some deep thought into their nuances.