Motivation

In-memory caches such as memcached and redis, accelerate requests to the largest web sites. They work by storing commonly requested items, such as results from database queries, in the main memory of nodes in cluster.

More memory capacity, e.g. through more servers or more RAM, increases fraction of redundant requests that can be served from RAM rather than requiring computation by a database.

This hit rate should ideally be high to help scale a popular website, but it comes at the cost of requiring more memory resources for the cache tier. Cache operators often find themselves in the dark as to how to provision their cache servers.

Approach

MIMIR is a framework for determining how much cache memory you need for an in-memory cache tier. It hooks into the LRU cache replacement policy of your cache server, and gives you periodic curves of the hit rate as a function of cache size from each server.

This so-called Hit Rate Curve (HRC) tells you what would have happened if you had run the current cache workload with a different memory allocation.

In short, MIMIR tracks keys for items that have recently been evicted from the cache, but not the data ("ghosts"). Maintaining information about what was recently cached gives us sufficient information about what would have happened with a larger cache than what has currently been allocated.

Inside MIMIR is an algorithm that views an LRU queue as a list of variable-sized buckets. Items in the cache know the identity of the bucket to which they belong. The buckets themselves are in the same relative order as in the LRU list, but the ordering of items within a bucket is not necessarily maintained, giving us a performance improvement over most other methods that are used to calculate Hit Rate Curves. When an item is hit, we must now estimate where an item is located in the LRU order even though we have conceded the accuracy of the order due to the performance constraints. The key insight in our method is that we estimate the location of an item within a bucket with uniform probability, and add this estimate to a density function for the Hit Rate Curves. When summing up the density function, we end up with the Hit Rate Curve shown on the pictures. A concern is now how items migrate between buckets, the details of which are explained in a paper we published about the system.

Results

We ran some benchmarks to assess our system from the vantage point of accuracy and performance. We obtained high accuracy in microbenchmarks.

Low throughput degradation in memcached. High accuracy in memcached with the YCSB b2 workload.

People

Trausti Sæmundsson (email: trausti [at] mimircache.com)

Hjörtur Björnsson

Gregory Chockler

Ýmir Vigfússon (email: ymir [at] mimircache.com)

Ingi Vífill Guðmundsson (design)

Paper

Published at ACM Symposium on Cloud Computing (SoCC) 2014

Poster (updated version)

First published at ACM Symposium on Cloud Computing (SoCC) 2013

Slides

Presented at ACM Symposium on Cloud Computing (SoCC) 2014 in Seattle, WA

`