Background
- Operators of a E-commerce platform wanna control the budget allocated to a specified promotion.
- E.g, for big sellers (especially), they would like to set a special Shipping Fee rule/promotion (cap, time period, etc) for certain days like Super Brand Day, or local Super Brand Day. And during those big events, a budget cap is involved.
- So as to
- Increase the adoption of promotions created by sellers, which may save the E-commerce’s platform’s costs
- Provide an finer granularity to control promotions, in terms of especially costs
- …
Budget System
-
A system to Manage the Budget Attributes of promotions, e.g.,
-
used_budget
-
total_budget
-
is_limited_budget
-
budget_usages
-
And detect overused budgets (maybe not, if considering Data Reconciliation Platform (DRP))
Budget VS Quota
-
Budget VS Quota
-
A Budget is a monetary amount to specify the max amount to be given in total by a promotion (in multiple orders), e.g., applying/enjoying a promotion before run out of S$100
-
A Quota is a number to specify the max number of applying a promotion (in multiple orders), e.g., only allow to enjoy a Voucher 100 times
-
How does such differences affect the system design?
Product Capability
- Increase the adoption of promotions created by sellers, which may save the E-commerce platofrm’s costs
- Provide an finer granularity to control promotions, especially in terms of costs
- …
Scope
High-level Design
Architecture Solution
Solution 1 - Sync DB Write
Note
- DB is the single source of truth
Pros
- Simple, easy to understand/debug
Cons
- May die if hot budgets
- One potential bottleneck is DB Write
- Not scalable
- E.g., if a super super hot promotion, more physical DB machines don’t help, since still select_for_update the same row of that hot promotion
Possible Evaluation Points
Possible to estimate the QPS to a promotion in the worst case scenario?
- Consider the current place_order QPS, promotion adoption rate, and the future expansion of the aforementioned
From DBA
- QPS (write&read) > 1k/s on a single table
- Write QPS should be less than 2k/s
- The capacity “2k/s write” is for 1 DB cluster. Specifically, 1 DB cluster is able to support the max write QPS being < 2k/s, no matter how many DBs in it.
- Write+Read QPS on a master DB should be less than 3k/s
- The total Read QPS should be less than 6k/s
From the Item team’s observation
- the max QPS select-for-update per DB instance is 400
- Although we don’t know what kind of concurrent control mechanism their DBs are config-ed, guess in a pessimistic concurrent control manner? instead of MVCC
Solution 2 - Async DB Write with Redis
Notes
- Redis is single source of truth
Pros
- Able to support much better performance than DBs due to read/write Redis on TCC
Cons
- If Redis (Budget Store) is totally down (both master and slaves), it results in inconsistency, until we restore the accurate budget data from persistent system (in our case, it is DBs)
- Potential inconsistency due to async master-slave Redis (i.e. the budget obtained > the actual/correct budget, which leads to a oversold-budget case)
- Voucher Quota is potentially having this issue (we would talk about it later)
- Possible to accept/solve this from product’s perspective??
Problem
One Problem due to async master-slave sync.
Other Solution
Voucher Quota
- A Quota system
- Redis data is the source of truth, and it’s eventually synchronized to the persistent storage (MySQL)
- Potential inconsistency due to async master-slave Redis
- If Redis (Quota Count Store) is down, it results in inconsistency (i.e. the remain_quota obtained > the actual/correct remain_quota, which leads to a oversold-voucher case)
Stock
- A Quota system
- If hot stocks, Redis data is the source of truth, and it’s eventually synchronized to IPS persistent storage (MySQL stock_tab), otherwise DB data is the the source of truth
- Oversold stocks is still possible to happen, due to using master-slave Redis
- But if oversold occurs, they would solve this issue on operation side (i.e., inform buyers to cancel this order due to the lack of stocks)
阿里库存系统 - 库存单元化
- 库存存储在中心单元库存和单元库存的概念
- 当某个单元库存行库存数扣完后,再到中心库存行去调拨,如果中心库存行也扣完,就会把单元库存行全部回收,直到把所有单元库存全部扣完为止
- 单元库存表在每个机房之间的数据是双向同步的,虽然表之间是双向同步,但里面的单行是单向同步,因为要保证单行的数据只能在一个单元写
- 每个单元机房的交易下单只减自己本单元的库存行,用户交易订单和库存扣减单据在同一个单元内,不需要跨单元调用
Refer to https://xie.infoq.cn/article/badf4fdf101fa2273961238ac
Other Interesting Points
- How to invalidate (e.g., purge/update) cache within different layers (e.g., Share Services, BASS services)
- How to detect overused budget scenarios, e.g., due to bugs
- What if considering Redis Persistence?
- Choice
- RDB (Redis Database)
- AOF (Append Only File)
- RDB + AOF
- Choice
- Difference between designing a Budget system (total $) and a Quota system (times)
- Similarity: limited resources, require (eventual) consistency
- Budgets are more “fragmented”
- Use “buckets” to achieve horizontal scalability in Quota systems (credited to Chen Dong)