【Distributed System】分布式事务 - Saga

Posted by 西维蜀黍 on 2019-07-12, Last Modified on 2023-02-21

Context

You have applied the Database per Service pattern. Each service has its own database. Some business transactions, however, span multiple service so you need a mechanism to implement transactions that span services. For example, let’s imagine that you are building an e-commerce store where customers have a credit limit. The application must ensure that a new order will not exceed the customer’s credit limit. Since Orders and Customers are in different databases owned by different services the application cannot simply use a local ACID transaction.

Problem

How to implement transactions that span services?

Forces

  • 2PC is not an option

Definition

Saga, another model for compensating transactions, is not a new concept. Saga-related papers were published in 1987, almost the same time as the XA two-phase commit protocol specification.

Saga, like TCC, is also a compensating transaction model, but it does not include a try phase. Saga regards distributed transactions as a transaction chain that is composed of a group of local transactions.

Each forward transaction operation in the transaction chain corresponds to a reversible transaction operation. The Saga transaction coordinator executes branch transactions in the transaction chain in sequence. After the branch transactions are all executed, resources are released. However, if a branch transaction fails, a compensating operation is performed in the opposite direction.

Assume that a Saga distributed transaction chain is composed of n branch transactions, that is, [T1, T2, …, Tn]. Then, there are three possible conditions where the distributed transaction executes:

  • T1, T2, …, Tn: A total of n transactions are executed successfully.
  • T1, T2, …, Ti, Ci, …, C2, C1: The execution failed at the i-th (i<=n) transaction. Then, compensating operations are called in sequence from i to 1. If any compensating operation fails, it will retry until it is successful. Compensating operations can be optimized for parallel execution.
  • T1, T2, …, Ti (failure), Ti (retry), Ti (retry), …, Tn: Applies to scenarios where transactions must succeed. If a failure occurs, the transaction will keep retrying and no compensating operation will be performed.

Solution

Implement each business transaction that spans multiple services is a saga. A saga is a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

There are two ways of coordination sagas:

  • Choreography - each local transaction publishes domain events that trigger local transactions in other services
  • Orchestration - an orchestrator (object) tells the participants what local transactions to execute

Consideration

Saga transactions, like TCC transactions, require the business design and implementation to follow three policies:

  • 允许空回滚(Allow Null Rollbacks): Transaction participants may receive the compensation order before performing normal operations due to network exceptions. In this case, null compensation is required.
  • 幂等控制(Idempotence): Forward operations and compensating operations can both be repeatedly triggered. Therefore, the idempotence of operations must be correct.
  • 防悬挂控制(Prevent Resource Suspension): If the forward operation arrives later than the compensating operation due to network exceptions, the forward operation must be discarded. Otherwise, resource suspension occurs.

Example

Example of Saga

Assume that Xiao Ming wants to take a trip on the National Day holiday. He plans to depart from Beijing, spend three days in London, and then pay a three-day visit to Paris before returning to Beijing. The whole trip involves ticket reservations from different airlines and hotel reservations in London and Paris. Xiao Ming’s plan is to cancel the trip if any of the reservations fail. Assume that a comprehensive travel service platform can make all reservations with one click, which resembles a long transaction. If the service is arranged by using Saga, as shown in the following figure, the trip reservation will be canceled through compensating operations when any of the reservations fail.

Example: Choreography-based saga

Choreography is a way to coordinate sagas where participants exchange events without a centralized point of control. With choreography, each local transaction publishes domain events that trigger local transactions in other services.

Benefits

  • Good for simple workflows that require few participants and don’t need a coordination logic.
  • Doesn’t require additional service implementation and maintenance.
  • Doesn’t introduce a single point of failure, since the responsibilities are distributed across the saga participants.

Drawbacks

  • Workflow can become confusing when adding new steps, as it’s difficult to track which saga participants listen to which commands.
  • There’s a risk of cyclic dependency between saga participants because they have to consume each other’s commands.
  • Integration testing is difficult because all services must be running to simulate a transaction.

Example: Orchestration-based saga

Orchestration is a way to coordinate sagas where a centralized controller tells the saga participants what local transactions to execute. The saga orchestrator handles all the transactions and tells the participants which operation to perform based on events. The orchestrator executes saga requests, stores and interprets the states of each task, and handles failure recovery with compensating transactions.

Benefits

  • Good for complex workflows involving many participants or new participants added over time.
  • Suitable when there is control over every participant in the process, and control over the flow of activities.
  • Doesn’t introduce cyclical dependencies, because the orchestrator unilaterally depends on the saga participants.
  • Saga participants don’t need to know about commands for other participants. Clear separation of concerns simplifies business logic.

Drawbacks

  • Additional design complexity requires an implementation of a coordination logic.
  • There’s an additional point of failure, because the orchestrator manages the complete workflow.

Feature Analysis

Saga transactions guarantee three transaction features:

  • Atomicity: The Saga coordinator can ensure that local transactions in the transaction chain are all committed or all rolled back.
  • Consistency: Saga transactions ensure eventual consistency.
  • Durability: Durability can be ensured as Saga is based on local transactions.

However, Saga does not guarantee the isolation of transactions. A local transaction will be visible to other transactions after it is committed. If other transactions have changed the data that has been submitted successfully, the compensating operation may fail. For example, the deduction fails, but the money in the account has gone. Therefore, we need to consider this scenario and avoid this problem from business design.


Comparison

Although Saga and TCC are both compensating transaction models, they are different due to different commit phases.

  • Saga adopts imperfect compensation. The compensation operation will leave traces of the original transaction operations. Therefore, the impact on the business must be considered.
  • TCC adopts perfect compensation. The compensating operation will completely clean up the original transaction operations, and users will not be able to perceive the status information before the transaction is canceled.
  • TCC can better support asynchronization, whereas Saga is generally more suitable for asynchronization in the compensating phase.

The Saga mode is suitable for long transactions and microservices, as it is less intrusive to business. At the same time, Saga uses the one-phase commit mode, which does not lock resources for a long time and has no “cask effects.” Therefore, systems with this architecture have high performance and high throughput.

Resulting Context

This pattern has the following benefits:

  • It enables an application to maintain data consistency across multiple services without using distributed transactions

This solution has the following drawbacks:

  • The programming model is more complex. For example, a developer must design compensating transactions that explicitly undo changes made earlier in a saga.

There are also the following issues to address:

  • In order to be reliable, a service must atomically update its database and publish a message/event. It cannot use the traditional mechanism of a distributed transaction that spans the database and the message broker. Instead, it must use one of the patterns listed below.

Reference