HawkEye demonstrates fine-grained OS support for huge pages, not fine-grained huge pages. The focus of this paper relies on when, where, and how to promote huge pages.
For dynamic size of huge pages, see: Diverse Contiguity of Memory Mapping.
There are two main memory issues:
Memory leakage is one of the primary reasons that result in memory bloat. Data fragmentation could be another reason even applications have no leakage issue.
To demonstrate this issue, the authors designed a three-phase workload:
While the size of the dataset reaches 45GB for the second time, the resident set size (RSS), also known as physical memory usage, reaches ~45GB. While the size of the dataset reaches 45GB for the second time, if Redis is developed correctly (of cause), there should be enough memory. However, both Linux and Ingens runs into out-of-memory error. This is due to the data fragmentation issue. As both designs employ huge pages, the fragmented memory usage could occupy even more physical memory.
Memory bloat for huge pages is an interesting finding.
Another well-known problem of huge pages includes high allocation overhead. More specifically, zeroing a huge page is considerably more expensive than that of regular pages.
The fairness of allocating huge pages for multiple processes is another ignored problem.
The benefits of using huge pages includes less overall page fault time and higher page translation performance.
Asynchronous page pre-zeroing aims to solve the initialization delay. It initializes the huge page in a background kernel thread and uses non-temporal hints to avoid cache pollution, thus significantly reduces both cache contention and the double cache miss problem.
Although pre-zeroing does not necessarily enable high-performance with 4KB pages, it enables non-negligible performance improvements with huge pages.
To resolve the memory bloat, HawkEye scan for zero-filled pages and convert huge pages with many zero-filled pages to regular pages. HawkEys then turn zero-filled pages to CoW pages.
HawkEye promotes huge page sized regions that are more frequently accessed. It implements access-coverage based promotion using a per-process data structure. Access-coverage denotes the number of base pages that are accessed. It searches for candidate across multiple processes.