【Architecture】System Design - Chat

Posted by 西维蜀黍 on 2023-07-31, Last Modified on 2023-09-22

Background

Candidate: What kind of chat app shall we design? 1 on 1 or group based? Interviewer: It should support both 1 on 1 and group chat.

Candidate: Is this a mobile app? Or a web app? Or both? Interviewer: Both.

Candidate: What is the scale of this app? A startup app or massive scale? Interviewer: It should support 50 million daily active users (DAU).

Candidate: For group chat, what is the group member limit? Interviewer: A maximum of 100 people

Candidate: What features are important for the chat app? Can it support attachment? Interviewer: 1 on 1 chat, group chat, online indicator. The system only supports text messages.

Candidate: Is there a message size limit?

Interviewer: Yes, text length should be less than 100,000 characters long.

Candidate: Is end-to-end encryption required?

Interviewer: Not required for now but we will discuss that if time allows.

Candidate: How long shall we store the chat history? Interviewer: Forever.

Scope

High-level Design

Proocol

  • TCP?
  • HTTP
  • WebSocket

Other features

  • sign-up
  • login
  • user profile
  • push notification

Components

  • Chat services

Storage

Selecting the correct storage system that supports all of our use cases is crucial. We recommend key-value stores for the following reasons:

  • Key-value stores allow easy horizontal scaling.
  • Key-value stores provide very low latency to access data.
  • Relational databases do not handle long tail [3] of data well. When the indexes grow large, random access is expensive.
  • Key-value stores are adopted by other proven reliable chat applications. For example, both Facebook messenger and Discord use key-value stores. Facebook messenger uses HBase [4], and Discord uses Cassandra [5].

Data Models

The pic below shows the message table for 1 on 1 chat. The primary key is message_id, which helps to decide message sequence. We cannot rely on created_at to decide the message sequence because two messages can be created at the same time.

The pic below shows the message table for group chat. The composite primary key is (channel_id, message_id). Channel and group represent the same meaning here. channel_id is the partition key because all queries in a group chat operate in a channel

Deep Dive

Service Discovery

Find a Chat Server via API Server

  1. User A tries to log in to the app.

  2. The load balancer sends the login request to API servers.

  3. After the backend authenticates the user, service discovery finds the best chat server for User A. In this example, server 2 is chosen and the server info is returned back to User A.

  4. User A connects to chat server 2 through WebSocket.

Message flows

  1. User A sends a chat message to Chat server 1.

  2. Chat server 1 obtains a message ID from the ID generator.

  3. Chat server 1 sends the message to the message sync queue.

  4. The message is stored in a key-value store.

  5. If User B is online, the message is forwarded to Chat server 2 where User B is connected.

  6. If User B is offline, a push notification is sent from push notification (PN) servers.

  7. Chat server 2 forwards the message to User B. There is a persistent WebSocket connection between User B and Chat server 2.

Message synchronization across multiple devices

Small group chat flow

Wrap up

Reference

  • System Design Interview – An insider’s guide