西维蜀黍

【Database】索引 - Log-structured Merge-Tree(LSM-Tree)

Background

During the 1990s, disk bandwidth, processor speed and main memory capacity were increasing at a rapid rate.

With increase in memory capacity, more items could now be cached in memory for reads. As a result, read workloads were mostly absorbed by the operating system page cache. However, disk seek times were still high due to the seek and rotational latency of physical R/W head in a spinning disk. A spinning disk needs to move to a given track and sector to write the data. In the case of random I/O, with frequent read and write operations, the movement of physical disk head becomes more than the time it takes to write the data. From the LFS paper, traditional file systems utilizing a spinning disk, spends only 5-10% of disk’s raw bandwidth whereas LFS permits about 65-75% in writing a new data (rest is for compaction). Traditional file systems write data at multiple places: the data block, recovery log and in-place updates to any metadata. The only bottleneck in file systems now, were during writes. As a result, there was a need to reduce writes and do less random I/O in file systems. LFS came with the idea that why not write everythng in a single log (even the metadata) and treat that as a single source of truth.

Log structured file systems treat your whole disk as a log. Data blocks are written to disk in an append only manner along with their metadata (inodes). Before appending them to disk, the writes are buffered in memory to reduce the overhead of disk seeks on every write. On reaching a certain size, they are then appended to disk as a segment (64kB-1MB). A segment contains data blocks containing changes to multitude of files along with their inodes. At the same time on every write, an inode map (imap) is also updated to point to the newly written inode number. The imap is also then appended to the log on every such write so it’s a just single seek away.

We’re not going too deep on LFS, but you get the idea. LSM Tree steals the idea of append only style updates in a log file and write buffering and has been adopted for use as a storage backend for a lot of write intensive key value database systems. Now that we know of their existence, let’s look at them more closely.

Log-structured merge-trees (LSM trees)

A log-structured merge-tree (LSM tree) is a data structure typically used when dealing with write-heavy workloads. The write path is optimized by only performing sequential writes.

  ...


【Kafka】Kafka 架构的演化

在这个过程中,你会看到 Kafka 在处理请求的过程中会遇到哪些高性能和高并发问题,以及架构为什么要这样演进,从而理解 Kafka 这么设计的意义和精妙之处。

  ...


【Network】科学上网

  ...


【Kafka】分区(Partition)

Partition

A topic could have multiple partitions.

Topics are partitioned, meaning a topic is spread over a number of “buckets” located on different Kafka brokers. This distributed placement of your data is very important for scalability because it allows client applications to both read and write the data from/to many brokers at the same time. When a new event is published to a topic, it is actually appended to one of the topic’s partitions.

Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition, and Kafka guarantees that any consumer of a given topic-partition will always read that partition’s events in exactly the same order as they were written.

  ...


【Kafka】消费过程(Consume)

Consumer Models

通常来讲,消息模型可以分为两种, 队列和发布-订阅式。

  • 队列(queue)的处理方式是 一组消费者从服务器读取消息,一条消息只有其中的一个消费者来处理。
  • 在发布-订阅模型中,消息被广播给所有订阅该消息的消费者,接收到消息的消费者都可以处理此消息。

Kafka为这两种模型提供了单一的消费者抽象模型: 消费者组(consumer group)。

  ...