SIEVE is simpler than LRU

Caching is a method of storing temporary data for quick access to keep the online world running smoothly. But with limited space comes a critical decision: what to keep and discard. This is where eviction algorithms come into play. Our team recently designed a new cache eviction algorithm called SIEVE: it is very effective and simple with just one queue.

Updates: The original writing style attracted considerable attention. We have updated the blog post for greater clarity and straightforwardness. We also include an easy-to-reproduce result at https://observablehq.com/@1a1a11a/sieve-miss-ratio-plots.

The Importance of Simplicity

It is important to keep cache eviction algorithms simple. Complex algorithms can bring headaches from time to time. For example, they can be tricky to debug and analyze when the miss ratio is high. Moreover, complexity means CPU cycles are needed to make the decisions.

On the flip side, simpler eviction methods usually do an excellent job of providing good predictability and throughput. For example, the Apache traffic server uses FIFO, MemC3 uses CLOCK, and Segcache uses FIFO-Merge eviction algorithms. It’s crucial to note that while these approaches excel in throughput, some compromise on cache efficiency.

Meet SIEVE

SIEVE is a cache eviction algorithm that decides what to keep in the cache and what to discard. It achieves both simplicity and efficiency.

Design

Data structure. SIEVE requires only one queue and one pointer called “hand”. The queue maintains the insertion order between objects. Each object in the queue uses one bit to track the visited/non-visited status. The hand points to the next eviction candidate in the cache and moves from the tail to the head.

SIEVE operations. A cache hit changes the visited bit of the accessed object to 1. SIEVE does not need to perform any operation for a popular object whose visited bit is already set. During a cache miss, SIEVE examines the object pointed by the hand. If it has been visited, the visited bit is reset, and the hand moves to the next position (the retained object stays in the original position of the queue). It continues this process until it encounters an object with the visited bit being 0 and evicts it. After the eviction, the hand points to the next position (the previous object in the queue). While an evicted object is in the middle of the queue most of the time, a new object is always inserted into the head of the queue.

Alt text

SIEVE vs. CLOCK At first glance, SIEVE is similar to CLOCK/Second Chance/FIFO-Reinsertion - Note that they are different implementations of the same eviction algorithm. Each algorithm maintains a queue in which each object is associated with a visited bit to track its access status. Visited objects are retained (also called “survived”) during an eviction. Notably, new objects are inserted at the head of the queue in both SIEVE and FIFO-Reinsertion. However, the hand in SIEVE moves from the tail to the head over time, whereas the hand in FIFO-Reinsertion stays at the tail. The key difference is where a retained object is kept. SIEVE keeps it in the old position, while FIFO-Reinsertion inserts it at the head, together with newly inserted objects.

See the sieve cache implementation code at the end of this blog post for a detailed example.

Performance

We evaluate SIEVE to answer the following questions:

  • Does SIEVE have higher efficiency than state-of-the-art cache eviction algorithms?
  • Is SIEVE simpler than other algorithms?
  • Can SIEVE improve a cache’s throughput and scalability?

In this blog post, we only show part of our evaluation results. If you’re interested in the full evaluation, please refer to our paper.

Efficiency

Our evaluation, involving over 1559 traces from diverse datasets containing 247,017 million requests to 14,852 million objects, shows that SIEVE outperforms all state-of-the-art eviction algorithms on more than 45% of the traces.

The following figure shows the miss ratio reduction (from FIFO) of different algorithms across traces. The whiskers on the boxplots are defined using p10 and p90, allowing us to disregard extreme data and concentrate on the typical cases. SIEVE demonstrates the most significant reductions across nearly all percentiles. For example, SIEVE reduces FIFO’s miss ratio by more than 42% on 10% of the traces (top whisker), with a mean of 21% on one of the largest CDN datasets. As a comparison, all other algorithms have smaller reductions on this dataset. Compared to advanced algorithms, e.g., ARC, SIEVE reduces ARC miss ratio by up to 63.2% with a mean of 1.5%.

Simplicity

SIEVE is very simple. We delved into the most popular cache libraries and systems across five diverse programming languages: C++, Go, JavaScript, Python, and Rust.

Despite the varied ways LRU is implemented across these libraries - some opt for doubly-linked lists, others for arrays - integrating SIEVE was easy. As illustrated in the Table, the required code changes to replace LRU with SIEVE were minimal. In all cases, it took no more than 21 lines of code modifications (tests not included).

Cache library Language Lines Hour of work
groupcache Golang 21 <1
mnemonist Javascript 12 1
lru-rs Rust 16 1
lru-dict Python + C 21 <1

Throughput

Besides efficiency, throughput is the other important metric for caching systems. Although we have implemented SIEVE in five different libraries, we focus on Cachelib’s results.

Compared to these LRU-based algorithms, SIEVE does not require “promotion” at each cache hit. Therefore, it is faster and more scalable. At a single thread, SIEVE is 16% (17%) faster than the optimized LRU (TwoQ) and on the tested traces. At 16 threads, SIEVE shows more than 2× higher throughput than the optimized LRU and TwoQ.

SIEVE is beyond an eviction algorithm

Beyond being a cache eviction algorithm, SIEVE can serve as a cache primitive for designing more advanced eviction policies. As a cache primitive, SIEVE can facilitate the design of more advanced eviction algorithms.

We’ve plugged SIEVE into LeCaR, TwoQ, ARC, and S3-FIFO, swapping out their LRU or FIFO queue for a SIEVE one.

Compared to SIEVE, LeCaR has much lower efficiency; however, when replacing the LRU in LeCaR with SIEVE, it significantly reduces LeCaR’s miss ratio by 4.5% on average. TwoQ and ARC achieve efficiency close to SIEVE; however, when replacing the LRU with SIEVE, the efficiency of both algorithms gets boosted. These results highlight the potential of SIEVE as a powerful cache primitive for designing advanced cache eviction algorithms.

SIEVE is not scan-resistant

Besides web cache workloads, we evaluated SIEVE on some old block cache workloads. However, SIEVE sometimes shows a miss ratio higher than LRU. The primary reason for this discrepancy is that SIEVE is not scan-resistant. In block cache workloads, which frequently feature scans, popular objects often intermingle with objects from scans. Consequently, both objects are rapidly evicted after insertion. Note that we made a mistake and used “scan-resistant” to mean something else. For the traditional sense of scans (larger than cache size), SIEVE is scan-resistant.

Since SIEVE does not use a ghost cache — a shadow cache that keeps track of recently evicted items to make smarter future eviction decisions — it cannot recognize the popular objects when they are requested again. This problem is less severe on the large cache size, but when the cache size is small, we observe that having a ghost is critical to be scan-resistant.

Marc’s latest blog post explored making sieve scan-resistant by adding a small counter for each item. It shows some wins and losses on different workloads.

We’d Love to Hear from you

As we wrap up this blog post, we would like to give a big shoutout to the people and organizations that open-sourced and shared the traces. SIEVE presents an intriguing opportunity to explore and enhance the efficiency of web caching. If you have questions or thoughts or have given SIEVE a try, we’re eager to hear from you! Don’t hesitate to get in touch :-)

Appendix

class Node:
    def __init__(self, value):
        self.value = value
        self.visited = False
        self.prev = None
        self.next = None

class SieveCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # To store cache items as {value: node}
        self.head = None
        self.tail = None
        self.hand = None
        self.size = 0

    def _add_to_head(self, node):
        node.next = self.head
        node.prev = None
        if self.head:
            self.head.prev = node
        self.head = node
        if self.tail is None:
            self.tail = node

    def _remove_node(self, node):
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next
        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev

    def _evict(self):
        obj = self.hand if self.hand else self.tail
        while obj and obj.visited:
            obj.visited = False
            obj = obj.prev if obj.prev else self.tail
        self.hand = obj.prev if obj.prev else None
        del self.cache[obj.value]
        self._remove_node(obj)
        self.size -= 1

    def access(self, x):
        if x in self.cache:  # Cache Hit
            node = self.cache[x]
            node.visited = True
        else:  # Cache Miss
            if self.size == self.capacity:  # Cache Full
                self._evict()  # Eviction
            new_node = Node(x)
            self._add_to_head(new_node)
            self.cache[x] = new_node
            self.size += 1
            new_node.visited = False  # Insertion

    def show_cache(self):
        current = self.head
        while current:
            print(f'{current.value} (Visited: {current.visited})', end=' -> ' if current.next else '\n')
            current = current.next

# Example usage
cache = SieveCache(3)
cache.access('A')
cache.access('B')
cache.access('C')
cache.access('D')
cache.show_cache()