Reduce Garbage Collection Interference in SSDs Through Workload Isolation

Towards SLO Complying SSDs Through OPS Isolation on FAST 2015.

This paper starts from the observation that two or more workloads executing concurrently will cause additional performance penalty in SSDs:

We observe from the results, however, that concurrent execution performs markedly worse than executing each VM individually. With concurrent execution, each VM performance is roughly a third of each individual execution with some deviation among the individual VMs. We also observe that bandwidth is not being consumed in full with the total bandwidth consumed by the three concurrent workloads being roughly 270MB/s.

That is reasonable because the mixed write operations from multiple concurrent workloads make the I/Os more complicated. The mixed write can cause unnecessarily fragment in SSDs, thus reduce both the read performance and write performance. For the read operations, the I/Os will become more randomized because the data for multiple applications are mixed together. This will make the FTL works more hard to find the mapping table. For the garbage collections, it is more hard to find a candidate block to be collected. The following figure shows how data from concurrent workloads are mixed in a block, and thus cause garbage collection interference in SSDs:

Data layout of concurrent workload

The solution is isolation. The data from different VMs are isolated on different blocks. The storage interface must change to transfer two information to SSDs, a tag identifying the workload and a weight determining the allocated size for a given workload. The weight are dynamically determined by the Write Amplification Factor of each VM which in turn determined the IOPS of each VM.

The results show that using OPS isolation and dynamically adjusting the OPS size based on u (Write Amplification Factor) results in quite accurate proportionality of IO bandwidth. However, static OPS isolation is not effective as there is no leeway to adjust the OPS size according to the workload characteristics.

Workload Isolation

It is worth notice that three applications/workloads studied by this paper are all write intensive:

Workload Request Total Write Ratio Average Write Size
Financial 7.1GB 0.76 14KB
MSN 14.6GB 0.96 27KB
Exchange 9.8GB 0.67 17KB

Actually, Workload Isolation is an old topic in the CS world. System software are optimized based on some simple assumptions, such as large sequential read/write. A mixed workloads can be more complicated to be optimized with out any semantical informations, such as which I/O belongs to which application. Besides, there is also a requirement of resource isolation from the user’s point of view.

There are many real world isolation examples. Linux managers are choosing an intelligent partition scheme depends on how the machine is used and selecting the appropriate file systems for different partitions (Partitioning is a more simpler solution, right?). There are also a lot of production level solutions, such as Database Isolation, VMWare Isolation. These solutions are not elegant from the researchers view. They are not designed based on the purpose of study the workload behavior, analysis the root of conflicts and gain the best performance in the design.