西维蜀黍

【Architecture】System Design - ID Generator

Scope

  • QPS

  • SLA

  • Required Performance

    • The system should be able to generate 10,000 IDs per second.
  • Do IDs only contain numerical values?

    • Interviewer: Yes, that is correct.
  • What is the ID length requirement?

    • Interviewer: IDs should fit into 64-bit.
  • For each new record, does ID increment by 1?

    • Interviewer: The ID increments by time but not necessarily only increments by 1. IDs created in the evening are larger than those created in the morning on the same day.
  ...


【Architecture】System Design - Chat

Background

Functional Requirements

Candidate: What kind of chat app shall we design? 1 on 1 or group based? Interviewer: It should support both 1 on 1 and group chat.

Candidate: Is this a mobile app? Or a web app? Or both? Interviewer: Both.

Candidate: For group chat, what is the group member limit? Interviewer: A maximum of 100 people

Candidate: What features are important for the chat app? Can it support attachment? Interviewer: 1 on 1 chat, group chat, online indicator. The system only supports text messages.

Non-functional Requirements

Candidate: What is the scale of this app? A startup app or massive scale? Interviewer: It should support 50 million daily active users (DAU).

    • Candidate: Is there a message size limit?
    • Interviewer: Yes, text length should be less than 100,000 characters long.
    • Candidate: Is end-to-end encryption required?
    • Interviewer: Not required for now but we will discuss that if time allows.

Candidate: How long shall we store the chat history? Interviewer: Forever.

  ...


【Architecture】System Design - Consistent Hashing

Background

The rehashing problem

If you have n cache servers, a common way to balance the load is to use the following hash method: serverIndex = hash(key) % N, where N is the size of the server pool.

Let us use an example to illustrate how it works. As shown below, we have 4 servers and 8 string keys with their hashes.

  ...


【Architecture】System Design - News Feed

Background

What is news feed? According to the Facebook help page, “News feed is the constantly updating list of stories in the middle of your home page. News Feed includes status updates, photos, videos, links, app activity, and likes from people, pages, and groups that you follow on Facebook”.

Similar questions commonly asked are: design Facebook news feed, Instagram feed, Twitter timeline, etc.

  ...


【Architecture】System Design

Overall

  1. 理清需求(功能性需求和非功能性需求)
    1. 约束条件,如QPS、用户量、存储量,Page View,Daily Active User
  2. High-level Design
    1. 描述系统中的一些组件和组件之间的交互关系

High-level Design

Template - System Design Details

  • Acknowledgement

    • By other teams
  • Changelog

  • Deployment Sequence

    • Same deployment sequence means they can be deployed together (no strong requirement on the sequence)
  • Introductions

    • Background

      • A little background on the changes
        • why the feature is necessary
        • why do we take this approaches
        • Terminologies
    • Objectives

      • Provide the objective of the services to support the feature/changes
    • Requirements

      • Functional Requirements

      • Non-Functional Requirements

        • 可靠性(Reliability)

        • 安全性(Security)

        • 可扩展性(Scalability)

          • throughput
          • DAU
          • User size
        • 可用性(Availability)

        • 性能(Performance)

          • latency
  • Flows and Logics

  • APIs

    • logic changes
    • highlight the new fields that are added
    • how to use the API
    • potential errors
  • Databases

    • ER Diagram
  • Dependencies

    • API level dependency multiplier
  • Performance

    • QPS analysis and potential bottlenecks. Might not need very accurate numbers here but can have some kind of estimations or justifications why this is not a concern.
    • Can also include performance issues that might not show up now but might be an issue in the future
  • Backward Compatibility

    • Feature Toggle
    • Clean up code
  • Monitoring

  • Downgrade/Rollback Plan

    • E.g., Rate limit, circuit breaker
  • System Limitation

  • Implementation Details

  • Investigations

    • Put the investigation result here
      • Alternatives that were considered but were not chosen due to xxx
      • Supporting data
  • Clean-up Plan

    • If this feature/project, needs any cleanup phase after the release, add the details here. (It can also be in a separate document and add a link to the document here)

      Details including, but are not limited to,

      • Code Clean up details
      • DB Clean up details
      • Config Clean up details
  • Financial Risk

  • Result Analysis

  • Task breakdown

  ...


【Architecture】System Design - Key-value Store

Background

A key-value store, also referred to as a key-value database, is a non-relational database. Each unique identifier is stored as a key with its associated value. This data pairing is known as a “key-value” pair.

In a key-value pair, the key must be unique, and the value associated with the key can be accessed through the key. Keys can be plain text or hashed values. For performance reasons, a short key works better. What do keys look like? Here are a few examples:

  • Plain text key: “last_logged_in_at”
  • Hashed key: 253DDEC4

The value in a key-value pair can be strings, lists, objects, etc. The value is usually treated as an opaque object in key-value stores, such as Amazon dynamo, Memcached, Redis, etc.

  ...


【Architecture】System Design - Rate Limiter

Background

If the API request count exceeds the threshold defined by the rate limiter, all the excess calls are blocked. Here are a few examples:

  • A user can write no more than 2 posts per second.
  • You can create a maximum of 10 accounts per day from the same IP address.
  • You can claim rewards no more than 5 times per week from the same device.

The benefits of using an API rate limiter:

  • Prevent resource starvation caused by Denial of Service (DoS) attack. Almost all APIs published by large tech companies enforce some form of rate limiting. For example, Twitter limits the number of tweets to 300 per 3 hours. Google docs APIs have the following default limit: 300 per user per 60 seconds for read requests. A rate limiter prevents DoS attacks, either intentional or unintentional, by blocking the excess calls.
  • Reduce cost. Limiting excess requests means fewer servers and allocating more resources to high priority APIs. Rate limiting is extremely important for companies that use paid third party APIs. For example, you are charged on a per-call basis for the following external APIs: check credit, make a payment, retrieve health records, etc. Limiting the number of calls is essential to reduce costs.
  • Prevent servers from being overloaded. To reduce server load, a rate limiter is used to filter out excess requests caused by bots or users’ misbehavior.

限流(Rate Limit)

Rate limiting is a technique used to control the rate at which requests are made to a network, server, or other resource. It is used to prevent excessive or abusive use of a resource and to ensure that the resource is available to all users.

Rate limiting is often used to protect against denial-of-service (DoS) attacks, which are designed to overwhelm a network or server with a high volume of requests, rendering it unavailable to legitimate users. It can also be used to limit the number of requests made by individual users, to ensure that a resource is not monopolized by a single user or group of users.

Scope

Functional Requirements

- Candidate: What kind of rate limiter are we going to design? Is it a client-side rate limiter or server-side API rate limiter?
- Interviewer: Great question. We focus on the server-side API rate limiter.
- client-side may be bypassed
    • I could assume we use the rate limiter for one API or like different APIs in different services, i.e., the system work in a distributed environment
    • Candidate: Does the rate limiter throttle API requests based on IP, the user ID, or other properties?
    • Interviewer: The rate limiter should be flexible enough to support different sets of throttle rules.
    • Candidate: Do we need to inform users who are throttled? Interviewer: Yes.
  • Let us assume that we don’t wanna integrate the rate limiter into any existing middlewares

    • E.g., for service mesh, we need rate limiting during RPC calls
    • For gateways, normally we need the functionality of the rating limiter as well.
  • Let us assume the scope of authentication is not under this rate limit

  • Do we wanna control the limit in real-time? or just hard-coded?

  ...


【Database】Transactions - Consistent Anomalies - 幻读(Phantom Read)

Phantom Rows

The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT ... WHERE query and, between both operations, Transaction 2 creates new rows (i.e., INSERT) or deletes existing rows (i.e., DELETE) (in the target table) which fulfill that WHERE clause.

In the diagram above, the flow of statements goes like this:

  1. Alice and Bob start two database transactions.
  2. Bob’s reads all the post_comment records associated with the post row with the identifier value of 1.
  3. Alice adds a new post_comment record which is associated with the post row having the identifier value of 1.
  4. Alice commits her database transaction.
  5. If Bob’s re-reads the post_comment records having the post_id column value equal to 1, he will observe a different version of this result set.

How to aviod Phantom Reads

If the highest level of isolation (i.e., SERIALIZABLE isolation level) were maintained, the same set of rows should be returned both times, and indeed that is what is mandated to occur in a database operating at the SERIALIZABLE isolation level. However, at the lesser isolation levels, a different set of rows may be returned the second time.

In the SERIALIZABLE isolation mode, Query 1 would result in all records with post_id =1 being locked, thus Query 2 would block until the first transaction was committed. In REPEATABLE READ mode, the range would not be locked, allowing the record to be inserted and the second execution of Query 1 to include the new row in its results.

How to prevent:

The 2PL-based Serializable isolation prevents Phantom Reads through the use of predicate locking while MVCC (Multi-Version Concurrency Control) database engines address the Phantom Read anomaly by returning consistent snapshots.

However, a concurrent transaction can still modify the range of records that was read previously. Even if the MVCC database engine introspects the transaction schedule, the outcome is not always the same as a 2PL-based implementation. One such example is when the second transaction issues an insert without reading the same range of records as the first transaction. In this particular use case, some MVCC database engines will not end up rolling back the first transaction.

  ...


【MySQL】设置隔离级别(Isolation Levels)

Show Transaction Isolation Level

check session transaction level (mysql8+)

SELECT @@transaction_ISOLATION;

check global transaction level (mysql8+)

SELECT @@global.transaction_ISOLATION;
  ...


【Database】Transactions - Consistent Anomalies Demo

MySQL Demo

Show Transaction Isolation Level

check session transaction level (mysql8+)

SELECT @@transaction_ISOLATION;

check global transaction level (mysql8+)

SELECT @@global.transaction_ISOLATION;
  ...