西维蜀黍

【Architecture】System Design - ID Mapping

Scope

Title

Bi-Directional Scalable Highly Available ID Mapping

Question Description

The e-commerce platform would like to maintain its own user id space so it can support users from different business / products

Design a highly scalable, highly available bi-directional user id mapping service to bind an external user id to an internal e-commerce platform user id.

  • map(external_uid) ==> internal_uid
  • reverse_map(internal_uid) ==> external_uid

Describe key components for your design, including database selection and design, cache design, logical flow etc.

  ...


【Architecture】System Design - Notification System

Scope

  • Candidate: What types of notifications does the system support? Interviewer: Push notification, SMS message, and email.

  • Candidate: Is it a real-time system? Interviewer: Let us say it is a soft real-time system. We want a user to receive notifications as soon as possible. However, if the system is under a high workload, a slight delay is acceptable.

  • Candidate: What are the supported devices? Interviewer: iOS devices, android devices, and laptop/desktop.

  • Candidate: What triggers notifications? Interviewer: Notifications can be triggered by client applications. They can also be scheduled on the server-side.

  • Candidate: Will users be able to opt-out? Interviewer: Yes, users who choose to opt-out will no longer receive notifications.

  • Candidate: How many notifications are sent out each day? Interviewer: 10 million mobile push notifications, 1 million SMS messages, and 5 million emails.

  ...


【Architecture】System Design - Youtube

Scope

  1. Candidate: What features are important? Interviewer: Ability to upload a video and watch a video.
  2. Candidate: What clients do we need to support? Interviewer: Mobile apps, web browsers, and smart TV.
  3. Candidate: How many daily active users do we have? Interviewer: 5 million
  4. Candidate: What is the average daily time spent on the product? Interviewer: 30 minutes.
  5. Candidate: Do we need to support international users? Interviewer: Yes, a large percentage of users are international users.
  6. Candidate: What are the supported video resolutions? Interviewer: The system accepts most of the video resolutions and formats.
  7. Candidate: Is encryption required? Interviewer: Yes
  8. Candidate: Any file size requirement for videos? Interviewer: Our platform focuses on small and medium-sized videos. The maximum allowed video size is 1GB.
  9. Candidate: Can we leverage some of the existing cloud infrastructures provided by Amazon, Google, or Microsoft? Interviewer: That is a great question. Building everything from scratch is unrealistic for most companies, it is recommended to leverage some of the existing cloud services.
  ...


【Engineering】Data Warehouse

Data Warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the data warehouse for reporting.

Extract, transform, load (ETL) and extract, load, transform (ELT) are the two main approaches used to build a data warehouse system.

  ...


【Hadoop】Hadoop distributed file system (HDFS)

Hadoop distributed file system

The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file system written in Java for the Hadoop framework. Some consider it to instead be a data store due to its lack of POSIX compliance, but it does provide shell commands and Java application programming interface (API) methods that are similar to other file systems. A Hadoop instance is divided into HDFS and MapReduce. HDFS is used for storing the data and MapReduce is used for processing data.

  ...


【Hadoop】HBase

HBase

HBase is an open-source non-relational distributed database

Use Apache HBase when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables – billions of rows X millions of columns – atop clusters of commodity hardware.

Apache HBase is an open-source, NoSQL, distributed big data store. It enables random, strictly consistent, real-time access to petabytes of data.

HBase is a column-oriented, non-relational database. This means that data is stored in individual columns, and indexed by a unique row key.

  ...


【Hadoop】HBase Shell

Commands using HBase Shell

Listing a Table

# Listing a Table
list
  ...


【Hadoop】Hive

Apache Hive

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale.

Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.

Apache Hive supports the analysis of large datasets stored in Hadoop’s HDFS and compatible file systems such as Amazon S3 filesystem and Alluxio. It provides a SQL-like query language called HiveQL with schema on read and transparently converts queries to MapReduce, Apache Tez and Spark jobs.

  ...


【Hadoop】学习

Hadoop

Hadoop allows the distributed processing of large data sets stored across clusters of computers.

The Hadoop framework consists of two main components

  • Hadoop Distributed File System (HDFS)
    • HDFS is an open source variant of the Google File System (GFS)
  • MapReduce programming framework
    • Hadoop MapReduce is the open source variant of Google MapReduce
  ...


【Database】Entity Relationship (E-R) Diagrams

ER Diagrams

An Entity Relationship (ER) Diagram is a type of flowchart that illustrates how “entities” such as people, objects or concepts relate to each other within a system. ER Diagrams are most often used to design or debug relational databases in the fields of software engineering, business information systems, education and research. Also known as ERDs or ER Models, they use a defined set of symbols such as rectangles, diamonds, ovals and connecting lines to depict the interconnectedness of entities, relationships and their attributes. They mirror grammatical structure, with entities as nouns and relationships as verbs.

  ...