High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads
Dipti Shankar, Xiaoyi Lu and Dhabaleswar Panda
The Ohio State University, The Ohio State University, The Ohio State University

Distributed key-value store-based caching solutions are being increasingly used to accelerate Big Data applications on modern HPC clusters. This has necessitated incorporating faulttolerance capabilities into high-performance key-value stores such as Memcached that are otherwise volatile in nature. Inmemory replication is being used as the primary mechanism to ensure resilient data operations. However, this incurs increased network I/O with high remote memory requirements. On the other hand, erasure coding is being extensively explored for enabling data resilience, while achieving better storage efficiency. In this paper, we first perform an in-depth modeling-based analysis of the performance trade-offs of In-Memory Replication and Erasure Coding schemes for key-value stores, and explore the possibilities of employing Online Erasure Coding for enabling resilience in high-performance key-value stores for HPC clusters. We then design a non-blocking API-based engine to perform efficient Set/Get operations by overlapping the encoding/decoding involved in enabling Erasure Coding-based resilience with the request/response phases, by leveraging RDMA on high performance interconnects. Performance evaluations show that the proposed designs can outperform synchronous RDMA-based replication by about 2.8x, and can improve YCSB throughput and average read/write latencies by about 1.34x - 2.6x over asynchronous replication for larger key-value pair sizes (>16 KB). We also demonstrate its benefits by incorporating it into a hybrid and resilient key-value store-based burst-buffer system over Lustre for accelerating Big Data I/O on HPC clusters.