Clear Sky Science · en

Tierra: multi-tiered arrays and recency-aware hot data decision

· Back to index

Why some data deserve the fast lane

Every time you stream a movie, hail a ride, or check your bank balance, computers quietly decide which pieces of information should stay close at hand and which can be pushed to the back shelves. This split between “hot” (often-used) and “cold” (rarely-used) data is vital for making modern apps feel instant. As storage hardware grows more complex and data volumes explode, those decisions are getting harder and more important. This paper introduces Tierra, a new way to spot hot data quickly and accurately, helping future storage systems run faster and last longer.

The challenge of finding hot spots in oceans of data

Behind the scenes, big services rely on layers of memory and storage, from tiny on-chip caches to solid-state drives and emerging non-volatile memories. Keeping frequently used data in the fastest layer can cut waiting time dramatically, and in flash-based devices it can even extend hardware lifetime by steering repeated writes to the right places. But figuring out what is truly hot is tricky. Earlier methods often tracked how many times each block of data was accessed, while mostly ignoring how recently those accesses occurred. Newer techniques tried to combine both recency and frequency using structures called Bloom filters, which are efficient but probabilistic. As workloads grew larger and more varied, these approaches either misclassified too much data, consumed too much memory and compute time, or both.

Reading patterns instead of every single step

Tierra takes a different route: instead of inspecting each data block at full detail, it first looks for patterns in how requests arrive over time. A key idea is “stack distance,” a measure of how many distinct items were touched between two visits to the same piece of data. Small distances mean an item tends to come back soon and is likely hot; large distances point to cold data. Computing this metric exactly is expensive, so the authors refine an earlier approximation method. They cap the size of the history they keep, discarding very old references so that estimates do not drift over time. This “capacity-fixed” design keeps the quality of the approximation high while limiting memory and lookup costs, even when there are millions of unique requests.

Letting a smart gatekeeper filter the crowd

Armed with stack distance, Tierra’s second stage acts as a gatekeeper for incoming requests. If a request’s distance is above a chosen threshold, it is almost certainly cold and is filtered out immediately. If it looks promising, the request is passed along as a hot-data candidate. Crucially, this screening layer does more than just say yes or no: it also assigns each candidate an initial “heat score” based on how recently it and its previous appearance occurred. That way, even when some requests are discarded, their timing still informs later decisions. Experiments show that this recency-aware screening removes about one and a half times more cold data than older filters while wrongly throwing away nearly twenty times fewer hot items.

Tiered shelves that respect freshness

Requests that survive the gatekeeper enter Tierra’s core structure: four arrays of different sizes that act like tiered shelves. Each entry records a reference to the data and two compact timestamps describing when it was last seen. Recent, frequently accessed items naturally linger in the upper tiers, while older, less active ones sink into smaller, lower tiers and are eventually evicted. When a request comes in, Tierra checks whether it is already on one of these shelves. If so, it updates the timestamps and adds up its stored heat scores, including up to three earlier touches, to decide whether the data should be considered hot right now. By organizing the arrays asymmetrically—larger at the top and smaller below—Tierra sharply cuts down on internal shuffling, reducing data movement by roughly a factor of three compared with evenly sized tiers.

How Tierra stacks up in the real world

The authors test Tierra using sixteen real storage traces from cloud services, smartphones, enterprise desktops, and laptops. They compare it with several prominent baselines, including traditional counting within a sliding window, hash-based schemes, and the latest Bloom-filter-based hot-data detectors. Across these diverse workloads, Tierra’s share of data marked as hot closely matches that of the trusted window-based baseline, but with far fewer mistakes: its overall misclassification rate averages just 0.6 percent. That is roughly 31 times lower than one classic scheme, 13 times lower than an improved dual-layer Bloom-filter design, and five times better than the prior state of the art called Multigrain. At the same time, Tierra is faster, cutting execution time by 1.4–1.7× versus competing methods, thanks to its early screening and coarse-grained handling of requests.

Why this matters for the systems you rely on

In plain terms, Tierra gives computers a sharper eye for what data they truly need to keep close. By combining a smart, bounded look at access history, a recency-aware screening gate, and a carefully tiered set of arrays, it balances speed, memory cost, and accuracy in a way older approaches could not. For cloud providers and device makers, that means more responsive services, better use of expensive fast memory, and longer-lived storage hardware. For everyday users, it means that the apps and services they depend on can keep pace with ever-growing data without bogging down.

Visual guide: big picture

Figure 1
Figure 1.

Visual guide: how Tierra works inside

Figure 2
Figure 2.

Citation: Lee, H., Park, D. Tierra: multi-tiered arrays and recency-aware hot data decision. Sci Rep 16, 13733 (2026). https://doi.org/10.1038/s41598-026-44185-1

Keywords: hot data identification, storage systems, non-volatile memory, cache locality, performance optimization