【Architecture】System Design - Budget/Quota System

Posted by 西维蜀黍 on 2023-10-10, Last Modified on 2023-10-11

Background

  • Operators of a E-commerce platform wanna control the budget allocated to a specified promotion.
    • E.g, for big sellers (especially), they would like to set a special Shipping Fee rule/promotion (cap, time period, etc) for certain days like Super Brand Day, or local Super Brand Day. And during those big events, a budget cap is involved.
  • So as to
    • Increase the adoption of promotions created by sellers, which may save the E-commerce’s platform’s costs
    • Provide an finer granularity to control promotions, in terms of especially costs

Budget System

  • A system to Manage the Budget Attributes of promotions, e.g.,

  • used_budget

  • total_budget

  • is_limited_budget

  • budget_usages

  • And detect overused budgets (maybe not, if considering Data Reconciliation Platform (DRP))

Budget VS Quota

  • Budget VS Quota

  • A Budget is a monetary amount to specify the max amount to be given in total by a promotion (in multiple orders), e.g., applying/enjoying a promotion before run out of S$100

  • A Quota is a number to specify the max number of applying a promotion (in multiple orders), e.g., only allow to enjoy a Voucher 100 times

  • How does such differences affect the system design?

Product Capability

  • Increase the adoption of promotions created by sellers, which may save the E-commerce platofrm’s costs
  • Provide an finer granularity to control promotions, especially in terms of costs

Scope

High-level Design

Architecture Solution

Solution 1 - Sync DB Write

Note

  • DB is the single source of truth

Pros

  • Simple, easy to understand/debug

Cons

  • May die if hot budgets
    • One potential bottleneck is DB Write
  • Not scalable
    • E.g., if a super super hot promotion, more physical DB machines don’t help, since still select_for_update the same row of that hot promotion

Possible Evaluation Points

Possible to estimate the QPS to a promotion in the worst case scenario?

  • Consider the current place_order QPS, promotion adoption rate, and the future expansion of the aforementioned

From DBA

  • QPS (write&read) > 1k/s on a single table
  • Write QPS should be less than 2k/s
    • The capacity “2k/s write” is for 1 DB cluster. Specifically, 1 DB cluster is able to support the max write QPS being < 2k/s, no matter how many DBs in it.
  • Write+Read QPS on a master DB should be less than 3k/s
  • The total Read QPS should be less than 6k/s

From the Item team’s observation

  • the max QPS select-for-update per DB instance is 400
    • Although we don’t know what kind of concurrent control mechanism their DBs are config-ed, guess in a pessimistic concurrent control manner? instead of MVCC

Solution 2 - Async DB Write with Redis

Notes

  • Redis is single source of truth

Pros

  • Able to support much better performance than DBs due to read/write Redis on TCC

Cons

  • If Redis (Budget Store) is totally down (both master and slaves), it results in inconsistency, until we restore the accurate budget data from persistent system (in our case, it is DBs)
  • Potential inconsistency due to async master-slave Redis (i.e. the budget obtained > the actual/correct budget, which leads to a oversold-budget case)
    • Voucher Quota is potentially having this issue (we would talk about it later)
    • Possible to accept/solve this from product’s perspective??

Problem

One Problem due to async master-slave sync.

Other Solution

Voucher Quota

  • A Quota system
  • Redis data is the source of truth, and it’s eventually synchronized to the persistent storage (MySQL)
  • Potential inconsistency due to async master-slave Redis
    • If Redis (Quota Count Store) is down, it results in inconsistency (i.e. the remain_quota obtained > the actual/correct remain_quota, which leads to a oversold-voucher case)

Stock

  • A Quota system
  • If hot stocks, Redis data is the source of truth, and it’s eventually synchronized to IPS persistent storage (MySQL stock_tab), otherwise DB data is the the source of truth
  • Oversold stocks is still possible to happen, due to using master-slave Redis
  • But if oversold occurs, they would solve this issue on operation side (i.e., inform buyers to cancel this order due to the lack of stocks)

阿里库存系统 - 库存单元化

  • 库存存储在中心单元库存和单元库存的概念
    • 当某个单元库存行库存数扣完后,再到中心库存行去调拨,如果中心库存行也扣完,就会把单元库存行全部回收,直到把所有单元库存全部扣完为止
    • 单元库存表在每个机房之间的数据是双向同步的,虽然表之间是双向同步,但里面的单行是单向同步,因为要保证单行的数据只能在一个单元写
  • 每个单元机房的交易下单只减自己本单元的库存行,用户交易订单和库存扣减单据在同一个单元内,不需要跨单元调用

Refer to https://xie.infoq.cn/article/badf4fdf101fa2273961238ac

Other Interesting Points

  • How to invalidate (e.g., purge/update) cache within different layers (e.g., Share Services, BASS services)
  • How to detect overused budget scenarios, e.g., due to bugs
  • What if considering Redis Persistence?
    • Choice
      • RDB (Redis Database)
      • AOF (Append Only File)
      • RDB + AOF
  • Difference between designing a Budget system (total $) and a Quota system (times)
    • Similarity: limited resources, require (eventual) consistency
    • Budgets are more “fragmented”
      • Use “buckets” to achieve horizontal scalability in Quota systems (credited to Chen Dong)

Deep Dive

Reference