Understanding Distributed Transactions with a real world example

Discussing how distributed transactions work when data is spread across various services

Nov 20, 2024

Imagine you’re booking that long-awaited vacation on a popular travel site. You spot the perfect deal: a luxury suite with a dreamy 20% discount. Excitedly, you add it to your cart, select your dates, and complete the payment, already envisioning yourself lounging in that suite. But then, reality hits. When you check your confirmation, you notice something’s off — the discount wasn’t applied, and you’ve been charged the full amount.

Dejected you will either have 2 options weather to cancel the booking or go ahead with a sad experience. Meanwhile, the tech team need to look into the problem when the customer care agent reports it internally to the team.

Frustrated, you contact customer support, hoping for a quick fix, only to be told that the discount code expired just before your booking went through. Now you’re left with two less-than-ideal choices: keep the booking at the higher rate or cancel and start over. What was supposed to be a seamless experience is now a letdown.

So, let’s break down what happened in this scenario:

The customer selected a room and applied a limited-offer discount code.
Within minutes, the customer completed the booking with the discount applied, but once the payment was processed, they saw that the offer hadn’t been applied after all.

The issue here? By the time the booking went through, the limited number of discounts had run out, and there was no available offer to apply.

As an engineer, this situation might look familiar — it’s a classic example of a distributed transaction failing across multiple services. The Inventory Service still had the room available, and initially, the Offer Service had the discount available too. But by the time the transaction completed, the discount had expired, leading to an inconsistent booking result.

In an ideal setup, the website should only confirm and charge the customer if:

The room is available, and
The discount is still available when the order is finalized.

So, as an engineer, how would you approach fixing this?

This is where the Two-Phase Commit (2PC) Protocol comes into play. Let’s explore how it works and how it can prevent issues like this by ensuring that all parts of a transaction succeed or fail together.

Use Case

When a user selects a room and applies a discount offer, the Booking Service coordinates with both the Inventory Service and the Offer Service to check availability. If the room is available and the offer is still valid, the services lock in both the room and the discount. This allows the user to proceed with the booking, seeing the confirmed price with the applied discount, ensuring a smooth and consistent experience.

Placing the order

When the customer places the order, the Order Service triggers the Booking Service to confirm the reservation. The Booking Service checks with the Inventory Service and finds the room still available, so it’s ready to proceed. However, when it tries to apply the discount, the Offer Service reports that no offers are left. As a result, the order could either be processed at the full price, or it may fail altogether — either outcome leads to a poor user experience.

2-Phase Commit

The Two-Phase Commit (2PC) protocol is a distributed algorithm that ensures a transaction is either fully completed or entirely rolled back across all participating nodes. This consistency is crucial in distributed systems, where multiple services need to stay in sync. As the name suggests, the protocol has two phases:

Phase 1 — Prepare Phase

In this phase, the coordinator (in our case, the Booking Service) sends a “prepare” message to all participating nodes — in this example, the Inventory Service and the Offer Service. Each service temporarily locks its resources (i.e., reserves the room and discount offer) and responds with either a “Yes” (ready to commit) or “No” (unable to commit) message. If both services respond with “Yes,” they are prepared to proceed.

Phase 2 — Commit Phase

After receiving all responses, the coordinator decides whether to commit or abort the transaction:

If all participants respond with “Yes,” the coordinator sends a commit message, and each service finalizes its part of the transaction.
If any service responds with “No,” the coordinator sends an abort message, and each service releases any resources it was holding, rolling back the transaction entirely.

Applying 2PC to Our Scenario

In this case, the Booking Service acts as the coordinator:

Phase 1: When the user selects the room and applies a discount, the Booking Service sends a prepare message to the Inventory and Offer Services, locking both the room and the discount offer. A timer can be set, giving the user a limited window (e.g., 5 minutes) to complete the booking. Each service replies with a “Yes” or “No.”

Phase 2: When the Order Service initiates the final transaction, the Booking Service coordinates with the Inventory and Offer Services to commit. If both services confirm success, the Booking Service responds with a success message to the Order Service. Otherwise, an abort message rolls back the transaction, and no charge is applied.

Advantages of the Two-Phase Commit Protocol

Consistency: 2PC ensures that either all services commit the transaction or all of them roll it back, keeping the system’s state consistent.
Atomicity: Transactions under 2PC are atomic, meaning they either succeed fully or fail completely, leaving no partial states.
Simplicity: The protocol is straightforward, making it a reliable choice for coordinating distributed transactions.

Conclusion

Our hotel booking example shows how 2PC can make complex transactions across different services — like inventory, offers, and payments — work smoothly together. By locking resources and double-checking each step, 2PC gives users a reliable experience, which is especially important when timing and availability are key.

Of course, 2PC isn’t without its trade-offs. Locking resources can sometimes affect performance, but it’s still a powerful tool for managing transactions in distributed systems. With 2PC, engineers can build systems that don’t just work well — they follow the “all-or-nothing” rule, making sure users get consistent results. In the end, 2PC helps maintain both system reliability and user trust, which are at the heart of any great digital experience.

Mayank’s Substack

Discussion about this post