Foundations and Trends® in Communications and Information Theory > Vol 19 > Issue 4

Codes for Distributed Storage

By Vinayak Ramkumar, Indian Institute of Science, Bengaluru, India, vinram93@gmail.com | S. B. Balaji, Qualcomm, Bengaluru, India, balaji.profess@gmail.com | Birenjith Sasidharan, Govt. Engineering College, Barton Hill, Trivandrum, India, birenjith@gmail.com | Myna Vajha, Qualcomm, Bengaluru, India, mynaramana@gmail.com | M. Nikhil Krishnan, International Institute of Information Technology Bangalore, India, nikhilkrishnan.m@gmail.com | P. Vijay Kumar, Indian Institute of Science, Bengaluru, India, pvk1729@gmail.com

 
Suggested Citation
Vinayak Ramkumar, S. B. Balaji, Birenjith Sasidharan, Myna Vajha, M. Nikhil Krishnan and P. Vijay Kumar (2022), "Codes for Distributed Storage", Foundations and TrendsĀ® in Communications and Information Theory: Vol. 19: No. 4, pp 547-813. http://dx.doi.org/10.1561/0100000115

Publication Date: 30 May 2022
© 2022 V. Ramkumar et al.
 
Subjects
Coding theory and practice,  Storage and recording codes,  Information theory and computer science,  Network coding
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Maximum Distance Separable Codes
3. Regenerating Codes
4. MBR Codes
5. MSR Codes
6. Storage-Repair-Bandwidth Tradeoff
7. Interior-Point ER Codes
8. Lower Bounds on Sub-Packetization Level of MSR Codes
9. Variants of Regenerating Codes
10. Locally Recoverable Codes
11. Codes with Availability
12. LRCs with Sequential Recovery
13. Hierarchical Locality
14. Maximally Recoverable Codes
15. Codes with Combined Locality and Regeneration
16. Repair of Reed-Solomon Codes
17. Codes in Practice
Acknowledgements
References

Abstract

In distributed data storage, information pertaining to a given data file is stored across multiple storage units or nodes in redundant fashion to protect against the principal concern, namely, the possibility of data loss arising from the failure of individual nodes. The simplest form of such protection is replication. The explosive growth in the amount of data generated on a daily basis brought up a second major concern, namely minimization of the overhead associated with such redundant storage. This concern led to the adoption by the storage industry of erasure-recovery codes such as Reed- Solomon (RS) codes and more generally, maximum distance separable codes, as these codes offer the lowest-possible storage overhead for a given level of reliability.

In the setting of a large data center, where the amount of stored data can run into several exabytes, a third concern arises, namely the need for efficient recovery from a commonplace occurrence, the failure of a single storage unit. One measure of efficiency in node repair is how small one can make the amount of data download needed to repair a failed unit, termed the repair bandwidth. This was the subject of the seminal paper by Dimakis et al. [50] in which an entirely new class of codes called regenerating codes was introduced, that within a certain repair framework, had the minimum-possible repair bandwidth. A second measure relates to the number of helper nodes contacted for node repair, termed the repair degree. A low repair degree is desirable as this means that a smaller number of nodes are impacted by the failure of a given node. The landmark paper by Gopalan et al. [72] focuses on this second measure, leading to the development of the theory of locally recoverable codes. The two events also led to the creation of a third class of codes known as locally regenerating codes, where the aim is to simultaneously achieve reduced repair bandwidth and low repair degree. Research in a different direction led researchers to take a fresh look at the challenge of efficient RS-code repair, and led to the identification of improved repair schemes for RS codes that have significantly reduced repair bandwidth.

This monograph introduces the reader to these different approaches towards efficient node repair and presents many of the fundamental bounds and code constructions that have since emerged. Several open problems are identified, and many of the sections have a notes subsection at the end that provides additional background.

DOI:10.1561/0100000115
ISBN: 978-1-63828-024-8
290 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-63828-025-5
290 pp. $145.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Maximum Distance Separable Codes
3. Regenerating Codes
4. MBR Codes
5. MSR Codes
6. Storage-Repair-Bandwidth Tradeoff
7. Interior-Point ER Codes
8. Lower Bounds on Sub-Packetization Level of MSR Codes
9. Variants of Regenerating Codes
10. Locally Recoverable Codes
11. Codes with Availability
12. LRCs with Sequential Recovery
13. Hierarchical Locality
14. Maximally Recoverable Codes
15. Codes with Combined Locality and Regeneration
16. Repair of Reed-Solomon Codes
17. Codes in Practice
Acknowledgements
References

Codes for Distributed Storage

Data storage has grown such that distributed storage over a number of systems is now commonplace. This has given rise to an increase in the complexity of ensuring data loss does not occur, particularly where failure is due to the failure of individual nodes within the storage system. Redundancy was the main tool to combat this, but with huge increases in data, minimization of the overhead associated with this technique caused major concern. In a large data center, a third concern arose, namely the need for efficient recovery from the failure of a single storage unit.

In this monograph, the authors give a comprehensive overview of the role of differing types of codes in addressing the issues in large distributed storage systems. They introduce the reader to regenerative codes, locally recoverable codes and locally regenerative codes; the three main classes of codes used in such systems. They give an exhaustive overview of how these codes were created, their uses and the developments and improvements of the codes in the last decade.

This in-depth review gives the reader an accessible and complete overview of the modern codes used in distributed storage systems today. It is a one-stop source for students, researchers and practitioners working on any such system.

 
CIT-115