西维蜀黍

【Architecture】System Design - Key-value Store

Background

A key-value store, also referred to as a key-value database, is a non-relational database. Each unique identifier is stored as a key with its associated value. This data pairing is known as a “key-value” pair.

In a key-value pair, the key must be unique, and the value associated with the key can be accessed through the key. Keys can be plain text or hashed values. For performance reasons, a short key works better. What do keys look like? Here are a few examples:

  • Plain text key: “last_logged_in_at”
  • Hashed key: 253DDEC4

The value in a key-value pair can be strings, lists, objects, etc. The value is usually treated as an opaque object in key-value stores, such as Amazon dynamo, Memcached, Redis, etc.

  ...


【Architecture】System Design - Rate Limiter

Background

If the API request count exceeds the threshold defined by the rate limiter, all the excess calls are blocked. Here are a few examples:

  • A user can write no more than 2 posts per second.
  • You can create a maximum of 10 accounts per day from the same IP address.
  • You can claim rewards no more than 5 times per week from the same device.

The benefits of using an API rate limiter:

  • Prevent resource starvation caused by Denial of Service (DoS) attack. Almost all APIs published by large tech companies enforce some form of rate limiting. For example, Twitter limits the number of tweets to 300 per 3 hours. Google docs APIs have the following default limit: 300 per user per 60 seconds for read requests. A rate limiter prevents DoS attacks, either intentional or unintentional, by blocking the excess calls.
  • Reduce cost. Limiting excess requests means fewer servers and allocating more resources to high priority APIs. Rate limiting is extremely important for companies that use paid third party APIs. For example, you are charged on a per-call basis for the following external APIs: check credit, make a payment, retrieve health records, etc. Limiting the number of calls is essential to reduce costs.
  • Prevent servers from being overloaded. To reduce server load, a rate limiter is used to filter out excess requests caused by bots or users’ misbehavior.

限流(Rate Limit)

Rate limiting is a technique used to control the rate at which requests are made to a network, server, or other resource. It is used to prevent excessive or abusive use of a resource and to ensure that the resource is available to all users.

Rate limiting is often used to protect against denial-of-service (DoS) attacks, which are designed to overwhelm a network or server with a high volume of requests, rendering it unavailable to legitimate users. It can also be used to limit the number of requests made by individual users, to ensure that a resource is not monopolized by a single user or group of users.

Scope

Functional Requirements

- Candidate: What kind of rate limiter are we going to design? Is it a client-side rate limiter or server-side API rate limiter?
- Interviewer: Great question. We focus on the server-side API rate limiter.
- client-side may be bypassed
    • I could assume we use the rate limiter for one API or like different APIs in different services, i.e., the system work in a distributed environment
    • Candidate: Does the rate limiter throttle API requests based on IP, the user ID, or other properties?
    • Interviewer: The rate limiter should be flexible enough to support different sets of throttle rules.
    • Candidate: Do we need to inform users who are throttled? Interviewer: Yes.
  • Let us assume that we don’t wanna integrate the rate limiter into any existing middlewares

    • E.g., for service mesh, we need rate limiting during RPC calls
    • For gateways, normally we need the functionality of the rating limiter as well.
  • Let us assume the scope of authentication is not under this rate limit

  • Do we wanna control the limit in real-time? or just hard-coded?

  ...


【Database】Transactions - Consistent Anomalies - 幻读(Phantom Read)

Phantom Rows

The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT ... WHERE query and, between both operations, Transaction 2 creates new rows (i.e., INSERT) or deletes existing rows (i.e., DELETE) (in the target table) which fulfill that WHERE clause.

In the diagram above, the flow of statements goes like this:

  1. Alice and Bob start two database transactions.
  2. Bob’s reads all the post_comment records associated with the post row with the identifier value of 1.
  3. Alice adds a new post_comment record which is associated with the post row having the identifier value of 1.
  4. Alice commits her database transaction.
  5. If Bob’s re-reads the post_comment records having the post_id column value equal to 1, he will observe a different version of this result set.

How to aviod Phantom Reads

If the highest level of isolation (i.e., SERIALIZABLE isolation level) were maintained, the same set of rows should be returned both times, and indeed that is what is mandated to occur in a database operating at the SERIALIZABLE isolation level. However, at the lesser isolation levels, a different set of rows may be returned the second time.

In the SERIALIZABLE isolation mode, Query 1 would result in all records with post_id =1 being locked, thus Query 2 would block until the first transaction was committed. In REPEATABLE READ mode, the range would not be locked, allowing the record to be inserted and the second execution of Query 1 to include the new row in its results.

How to prevent:

The 2PL-based Serializable isolation prevents Phantom Reads through the use of predicate locking while MVCC (Multi-Version Concurrency Control) database engines address the Phantom Read anomaly by returning consistent snapshots.

However, a concurrent transaction can still modify the range of records that was read previously. Even if the MVCC database engine introspects the transaction schedule, the outcome is not always the same as a 2PL-based implementation. One such example is when the second transaction issues an insert without reading the same range of records as the first transaction. In this particular use case, some MVCC database engines will not end up rolling back the first transaction.

  ...


【MySQL】设置隔离级别(Isolation Levels)

Show Transaction Isolation Level

check session transaction level (mysql8+)

SELECT @@transaction_ISOLATION;

check global transaction level (mysql8+)

SELECT @@global.transaction_ISOLATION;
  ...


【Concurrent Control】MVCC

Multiversion concurrency control (MVCC)

这里的版本号可以任何属性,只要当一次数据修改操作被执行后,这个属性一定会被改变即可,比如数据被修改的次数、版本号、时间戳(timestamp)。

  ...