# Cache Aside Pattern / Lazy-load

This is the most commonly used cache update strategy in applications. 其具体逻辑如下：

• 失效：应用程序先从cache取数据，没有得到，则从数据库中取数据，成功后，放到缓存中。

• 命中：应用程序从cache中取数据，取到后返回。

• 更新：先把数据存到数据库中，成功后，再让缓存失效。

# Scenarios

## 缓存失效/命中

In this update strategy, cache sits aside and an application talks to cache and data store directly. It is also known as lazy-loading. Application logic first checks in the cache before hitting the database. It is mostly used with an application with read-heavy workloads.

An application retrieves data by referencing the cache. If the data isn’t in the cache, it’s retrieved from the data store and added to the cache. Any modifications to data held in the cache are automatically written back to the data store as well.

For caches that don’t provide this functionality, it’s the responsibility of the applications that use the cache to maintain the data.

## 缓存更新

• 更新：先把数据存到数据库中，成功后，再让缓存失效。

Discussion - 缓存更新时，Delete Cache First or Update DB First

# Potential Problem

• 如果使用了MySQL Master-Slave，当出现DB delay的时候，因而当完成写操作且触发 invalidate cache 操作后，从Slave DB读取数据，且这个数据是更新前的旧数据（由于 DB delay），从而使得重新写入 cache 中的数据仍然是旧数据。

## Solution

1. 先写数据库

2. 删除缓存

3. 休眠1秒，再次删除缓存

• 这一步可以这样实现：

• Solution 1：在第一次删除缓存后，开启一个线程，并让这个线程在1s后，执行再次删除
• Solution 2：通过读取DB的binlog和一个消息队列来实现再次删除

Analysis

• single source of truth 为 DB
• 这里具体休眠多久要结合业务情况考虑。
• 如果考虑到删除可能失败，再增加删除失败时的重试机制。

# Example

// *****************************************
// function that returns a customer's record.
// Attempts to retrieve the record from the cache.
// If it is retrieved, the record is returned to the application.
// If the record is not retrieved from the cache, it is
//    added to the cache, and
//    returned to the application
// *****************************************
get_customer(customer_id)

customer_record = cache.get(customer_id)
if (customer_record == null)

customer_record = db.query("SELECT * FROM Customers WHERE id = {0}", customer_id)
cache.set(customer_id, customer_record)

return customer_record


For this example, the application code that gets the data is the following.

customer_record = get_customer(12345)


# Analysis

• It does not load or hold all the data together, it’s on demand. Suitable for cases when you know that your application might not need to cache all data from data source in a particular category.

### Node failures aren’t fatal for your application

• When a node fails and is replaced by a new, empty node, your application continues to function, though with increased latency.
• As requests are made to the new node, each cache miss results in a query of the database. At the same time, the data copy is added to the cache so that subsequent requests are retrieved from the cache.

### Cache Miss Penalty

Each cache miss results in three trips:

1. Initial request for data from the cache
2. Query of the database for the data
3. Writing the data to the cache
• These misses can cause a noticeable delay in data getting to the application.

Developers deal with this by warming (pre-heating) the cache or Refresh Ahead Caching.

### Stale Data

• Since data is written to the cache only when there is a cache miss, data in the cache can become stale. This result occurs because there are no updates to the cache when data is changed in the database.
• To address this issue, you can use cache update mechanisms (e.g., Write-through), update invalidation mechanisms, or Adding TTL.

### Possible Low Cache Hit Rate

• Because most data is never requested, lazy loading avoids filling up the cache with data that isn’t requested.