【Redis】Redis 性能分析 Insight

Posted by 西维蜀黍 on 2020-05-09, Last Modified on 2021-09-21

Factors impacting Redis performance

There are multiple factors having direct consequences on Redis performance. We mention them here, since they can alter the result of any benchmarks. Please note however, that a typical Redis instance running on a low end, untuned box usually provides good enough performance for most applications.

Network bandwidth and latency

Network bandwidth and latency usually have a direct impact on the performance.

It is a good practice to use the ping program to quickly check the latency between the client and server hosts is normal before launching the benchmark.

Regarding the bandwidth, it is generally useful to estimate the throughput in Gbit/s and compare it to the theoretical bandwidth of the network. For instance a benchmark setting 4 KB strings in Redis at 100000 q/s, would actually consume 3.2 Gbit/s of bandwidth and probably fit within a 10 Gbit/s link, but not a 1 Gbit/s one. In many real world scenarios, Redis throughput is limited by the network well before being limited by the CPU. To consolidate several high-throughput Redis instances on a single server, it worth considering putting a 10 Gbit/s NIC or multiple 1 Gbit/s NICs with TCP/IP bonding.

But from another pespective, if bottleneck is NIC, it means that there is no obvious bottleneck at CPU. And thus upgrade the NIC could improve the performance.

But sometimes, if we use 10Gb/s NIC, but somehow the network throughput cannot close to 10Gb/s, and the performance analysis becomes more complex.

$ redis-benchmark -h 192.168.18.128 -c 100 -r 1 -l -t set -d 4000
Summary:
  throughput summary: 79113.92 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.736     0.256     0.743     0.791     0.863     7.535

^RT: rps=104100.0 (overall: 100567.6) avg_msec=0.568 (overall: 0.585)
^CT: rps=100318.7 (overall: 100467.0) avg_msec=0.583 (overall: 0.584)

CPU

CPU is another very important factor. Being single-threaded, Redis favors fast CPUs with large caches and not many cores. At this game, Intel CPUs are currently the winners. It is not uncommon to get only half the performance on an AMD Opteron CPU compared to similar Nehalem EP/Westmere EP/Sandy Bridge Intel CPUs with Redis. When client and server run on the same box, the CPU is the limiting factor with redis-benchmark.

RAM

  • Speed of RAM and memory bandwidth seem less critical for global performance especially for small objects. For large objects (>10 KB), it may become noticeable though. Usually, it is not really cost-effective to buy expensive fast memory modules to optimize Redis.

Misc

  • Redis runs slower on a VM compared to running without virtualization using the same hardware. If you have the chance to run Redis on a physical machine this is preferred. However this does not mean that Redis is slow in virtualized environments, the delivered performances are still very good and most of the serious performance issues you may incur in virtualized environments are due to over-provisioning, non-local disks with high latency, or old hypervisor software that have slow fork syscall implementation.
  • When the server and client benchmark programs run on the same box, both the TCP/IP loopback and unix domain sockets can be used. Depending on the platform, unix domain sockets can achieve around 50% more throughput than the TCP/IP loopback (on Linux for instance). The default behavior of redis-benchmark is to use the TCP/IP loopback.
  • The performance benefit of unix domain sockets compared to TCP/IP loopback tends to decrease when pipelining is heavily used (i.e. long pipelines).
  • When an ethernet network is used to access Redis, aggregating commands using pipelining is especially efficient when the size of the data is kept under the ethernet packet size (about 1500 bytes). Actually, processing 10 bytes, 100 bytes, or 1000 bytes queries almost result in the same throughput. See the graph below.
  • On multi CPU sockets servers, Redis performance becomes dependent on the NUMA configuration and process location. The most visible effect is that redis-benchmark results seem non-deterministic because client and server processes are distributed randomly on the cores. To get deterministic results, it is required to use process placement tools (on Linux: taskset or numactl). The most efficient combination is always to put the client and server on two different cores of the same CPU to benefit from the L3 cache. Here are some results of 4 KB SET benchmark for 3 server CPUs (AMD Istanbul, Intel Nehalem EX, and Intel Westmere) with different relative placements. Please note this benchmark is not meant to compare CPU models between themselves (CPUs exact model and frequency are therefore not disclosed).
  • With high-end configurations, the number of client connections is also an important factor. Being based on epoll/kqueue, the Redis event loop is quite scalable. Redis has already been benchmarked at more than 60000 connections, and was still able to sustain 50000 q/s in these conditions. As a rule of thumb, an instance with 30000 connections can only process half the throughput achievable with 100 connections. Here is an example showing the throughput of a Redis instance per number of connections:

Latency Monitoring

What is high latency for one use case is not high latency for another. There are applications where all the queries must be served in less than 1 millisecond and applications where from time to time a small percentage of clients experiencing a 2 second latency is acceptable.

So the first step to enable the latency monitor is to set a latency threshold in milliseconds. Only events that will take more than the specified threshold will be logged as latency spikes. The user should set the threshold according to their needs. For example if for the requirements of the application based on Redis the maximum acceptable latency is 100 milliseconds, the threshold should be set to such a value in order to log all the events blocking the server for a time equal or greater to 100 milliseconds.

The latency monitor can easily be enabled at runtime in a production server with the following command:

CONFIG SET latency-monitor-threshold 100

By default monitoring is disabled (threshold set to 0), even if the actual cost of latency monitoring is near zero. However while the memory requirements of latency monitoring are very small, there is no good reason to raise the baseline memory usage of a Redis instance that is working well.

Information reporting with the LATENCY command

The user interface to the latency monitoring subsystem is the LATENCY command. Like many other Redis commands, LATENCY accepts subcommands that modifies its behavior. These subcommands are:

  • LATENCY LATEST - returns the latest latency samples for all events.

  • LATENCY HISTORY - returns latency time series for a given event.

  • LATENCY RESET - resets latency time series data for one or more events.

  • LATENCY GRAPH - renders an ASCII-art graph of an event’s latency samples.

  • LATENCY DOCTOR - replies with a human-readable latency analysis report.

    127.0.0.1:6379> latency doctor
    
    Dave, I have observed latency spikes in this Redis instance.
    You don't mind talking about it, do you Dave?
    
    1. command: 5 latency spikes (average 300ms, mean deviation 120ms,
        period 73.40 sec). Worst all time event 500ms.
    
    I have a few advices for you:
    
    - Your current Slow Log configuration only logs events that are
        slower than your configured latency monitor threshold. Please
        use 'CONFIG SET slowlog-log-slower-than 1000'.
    - Check your Slow Log to understand what are the commands you are
        running which are too slow to execute. Please check
        http://redis.io/commands/slowlog for more information.
    - Deleting, expiring or evicting (because of maxmemory policy)
        large objects is a blocking operation. If you have very large
        objects that are often deleted, expired, or evicted, try to
        fragment those objects into multiple smaller objects.
    

Redis slow log

// TODO

https://redis.io/commands/slowlog

Measuring latency

If you are experiencing latency problems, you probably know how to measure it in the context of your application, or maybe your latency problem is very evident even macroscopically. However redis-cli can be used to measure the latency of a Redis server in milliseconds, just try:

$ redis-cli --latency -h `host` -p `port`

Reference