NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM

NVOverlay: Enabling Efficient and ScalableHigh-Frequency Snapshotting to NVM

motivation

There are two challenges for fast snapshot in persistent memory.

  • track the “delta” data which be in snapshot at a low cost.
  • avoid write amplification to create snapshot.

Software work suffers write amplification because of log.
Hardware work also suffers log problem and is weak for scalablility

Design

To over come above challenges, NVOverlay proposes a novel design, and it can be divede into two component:

  • COHERENT SNAPSHOT TRACKING (CST)
  • Overlay Mapping Tables

COHERENT SNAPSHOT TRACKING (CST)

image
CST is based on MESI protocol and can be implemented in other protocols as well, NVOverlay divides region above LLC as VDs(version domain) with individual epoch. Each cache line is tagged by an extra version field, OID(OVerlay ID), indicating the epoch in which the line is last written. CST relaxed consistency requirements and let multiple versions exist in a VD at the same time. And OMC is responsible for translating the address from VD to shadow address on the NVM.

image
Store-eviction: when L1 cache putx data to L2 cache, if the version in L2 cache is old than L1 cache, then L2 cache will evict the old version to LLC and OMC to store a old version. And only epoch from remote area is larger than the local can the local epoch be updated.

image snapshot: The captured snapshot only reflects real-time memory states of VD0, VD1, and VD2 in time t5, t7 and t8, respectively.

image Inter-VD image Intra-VD image

When the cacheline is evicted from L2 cache, data will be sent to both LLC and OMC. And it will be persisted by OMC.

MULTI-SNAPSHOT NVM MAPPING (MNM)

Data evicted from cache is managed by MNM. Only after all VDs have (1) advanced their local epochs past E; and (2) written back all dirty versions produced in E. MNM maps different versions of data to different physical addresses。For each epoch E, the OMC maintains a per-epoch volatile mapping table ME. The mapping table is implemented as a four-level radix tree in DRAM, similar to x86-64 page tables.

Through CST and MNM, NVOverlay can make fast snapshot without logging and the write amplification is 1.5X+ less than logging(2X+).

reference

  • MESI https://zhuanlan.zhihu.com/p/508315407
  • https://asplos.dev/wordpress/2021/06/27/nvoverlay-enabling-efficient-and-scalable-highfreq/