【Cache System】缓存更新策略 - Cache Aside

Posted by 西维蜀黍 on 2021-08-05, Last Modified on 2021-10-19

Cache Aside Pattern / Lazy-load

This is the most commonly used cache update strategy in applications. 其具体逻辑如下:

  • 失效:应用程序先从cache取数据,没有得到,则从数据库中取数据,成功后,放到缓存中。

  • 命中:应用程序从cache中取数据,取到后返回。

  • 更新:先把数据存到数据库中,成功后,再让缓存失效。

Scenarios

缓存失效/命中

In this update strategy, cache sits aside and an application talks to cache and data store directly. It is also known as lazy-loading. Application logic first checks in the cache before hitting the database. It is mostly used with an application with read-heavy workloads.

An application retrieves data by referencing the cache. If the data isn’t in the cache, it’s retrieved from the data store and added to the cache. Any modifications to data held in the cache are automatically written back to the data store as well.

For caches that don’t provide this functionality, it’s the responsibility of the applications that use the cache to maintain the data.

缓存更新

  • 更新:先把数据存到数据库中,成功后,再让缓存失效。

Discussion - 缓存更新时,Delete Cache First or Update DB First

结论是,一定要 Update DB First ,且必须要等 Update 成功后,再 Delete Cache。

要不然,可以想象,如果 Update DB 失败的时候,就会在 Cache 中引入脏数据(stale data)了。

或者,即使最后 update DB成功了,如果这时候QPS 很高,delete cache已经完成了,这时,另外一个request发现 cache中没有数据,因此从还没有被update 的 DB 中读到未更新的数据,并写入到cache中,这样就产生了stale data,即使update DB之后,这个stale data仍然会一直存在。

Potential Problem

那么,是不是Cache Aside这个就不会有并发问题了?不是的,比如,一个是读操作,但是没有命中缓存,然后就到数据库中取数据,此时来了一个写操作,写完数据库后,让 cache 失效。

然后,读操作读到的旧数据会被写入到 cache,所以,会造成脏数据。

但,这个case理论上会出现,不过,实际上出现的概率可能非常低,因为这个条件需要发生在读缓存时,出现缓存失效,而且并发着有一个写操作。而实际上数据库的写操作会比读操作慢得多,而且还要锁表,而读操作必需在写操作前进入数据库操作,而又要晚于写操作更新缓存,所有的这些条件都具备的概率基本并不大。

当然,还有其他的可能也会导致脏数据,比如

  • 如果使用了MySQL Master-Slave,当出现DB delay的时候,因而当完成写操作且触发 invalidate cache 操作后,从Slave DB读取数据,且这个数据是更新前的旧数据(由于 DB delay),从而使得重新写入 cache 中的数据仍然是旧数据。

Solution

采用延时双删策略(缓存双淘汰法),可以将前面所造成的缓存脏数据,再次删除,即:

  1. 先写数据库

  2. 删除缓存

  3. 休眠1秒,再次删除缓存

    • 这一步可以这样实现:

      • Solution 1:在第一次删除缓存后,开启一个线程,并让这个线程在1s后,执行再次删除
      • Solution 2:通过读取DB的binlog和一个消息队列来实现再次删除

Analysis

  • single source of truth 为 DB
  • 这里具体休眠多久要结合业务情况考虑。
  • 如果考虑到删除可能失败,再增加删除失败时的重试机制。

结论

所以,我们还是需要回到CAP,即要么通过2PC或是Paxos协议保证一致性(通过牺牲可用性来避免上面问题的出现),要么牺牲一致性来保证可用性。

Example

The following is a pseudocode example of lazy loading logic.

// *****************************************
// function that returns a customer's record.
// Attempts to retrieve the record from the cache.
// If it is retrieved, the record is returned to the application.
// If the record is not retrieved from the cache, it is
//    retrieved from the database, 
//    added to the cache, and 
//    returned to the application
// *****************************************
get_customer(customer_id)

    customer_record = cache.get(customer_id)
    if (customer_record == null)
    
        customer_record = db.query("SELECT * FROM Customers WHERE id = {0}", customer_id)
        cache.set(customer_id, customer_record)
    
    return customer_record

For this example, the application code that gets the data is the following.

customer_record = get_customer(12345)

Analysis

Advantages

Load Data on Demand

  • It does not load or hold all the data together, it’s on demand. Suitable for cases when you know that your application might not need to cache all data from data source in a particular category.

Node failures aren’t fatal for your application

  • When a node fails and is replaced by a new, empty node, your application continues to function, though with increased latency.
  • As requests are made to the new node, each cache miss results in a query of the database. At the same time, the data copy is added to the cache so that subsequent requests are retrieved from the cache.

Disadvantages

The disadvantages of lazy loading are as follows

Cache Miss Penalty

Each cache miss results in three trips:

  1. Initial request for data from the cache
  2. Query of the database for the data
  3. Writing the data to the cache
    • These misses can cause a noticeable delay in data getting to the application.

Developers deal with this by warming (pre-heating) the cache or Refresh Ahead Caching.

Stale Data

  • Since data is written to the cache only when there is a cache miss, data in the cache can become stale. This result occurs because there are no updates to the cache when data is changed in the database.
  • To address this issue, you can use cache update mechanisms (e.g., Write-through), update invalidation mechanisms, or Adding TTL.

Possible Low Cache Hit Rate

  • Because most data is never requested, lazy loading avoids filling up the cache with data that isn’t requested.

Read-through VS cache-aside

While read-through and cache-aside are very similar, there are at least two key differences:

  1. In cache-aside, the application is responsible for fetching data from the database and populating the cache. In read-through, this logic is usually supported by the library or stand-alone cache provider (which means that the internal storage components are transparent to the clients/callers and what they can perceive is one single storage).
  2. Unlike cache-aside, the data model in read-through cache cannot be different than that of the database.

Reference