Foundations and Trends® in Integrated Circuits and Systems > Vol 2 > Issue 4

Recent Advances in Testing Techniques for AI Hardware Accelerators

By Arjun Chaudhuri, Duke University, USA, arjun.chaudhuri@duke.edu | Ching-Yuan Chen, Duke University, USA, chingyuan.chen@duke.edu | Krishnendu Chakrabarty, Arizona State University, USA, krishnendu.chakrabarty@asu.edu

 
Suggested Citation
Arjun Chaudhuri, Ching-Yuan Chen and Krishnendu Chakrabarty (2023), "Recent Advances in Testing Techniques for AI Hardware Accelerators", Foundations and Trends® in Integrated Circuits and Systems: Vol. 2: No. 4, pp 244-380. http://dx.doi.org/10.1561/3500000011

Publication Date: 21 Jun 2023
© 2023 A. Chaudhuri et al.
 
Subjects
CMOS technology,  Circuit design methods,  Emerging technologies,  Test,  Classification and prediction,  Optimization,  Robustness,  Deep learning
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Advances in Robustness Analysis of Von-Neumann Systolic Array-based Accelerators
3. Graph Convolutional Network (GCN) for Criticality Evaluation
4. Neural Twin-driven Robustness Analysis
5. Advances in Testing of Von-Neumann Systolic Array-based Accelerators
6. Robustness of Near-Memory Computing Paradigm
7. Testing and Robustness for Compute-in-Memory AI Accelerators
8. Conclusion
Acknowledgements
References

Abstract

Emerging device technologies such as silicon photonics, nonvolatile memories, and heterogeneous monolithic 3D (M3D) integration are being explored as post-Moore’s law alternatives for achieving high-density integration of many-core AI accelerators. In addition to innovations at the device level, architectural optimizations are also being carried out to achieve high-performance processing of large AI workloads with custom accelerator hardware. Systolic array-based inferencing accelerators achieve higher throughput and improved energy efficiency compared to CPUs and GPUs because of the homogeneous and regular data flow in systolic arrays. However, the performance of such emerging AI accelerators can be adversely affected by faults due to process variations, manufacturing defects, and aging. In this monograph, we analyze the performance of several emerging AI accelerators in the presence of different uncertainties and present low-cost methods to assess the significance of faults and mitigate their effects. We show that across all technologies, the functional criticality of faults can vary significantly based on the fault type, fault location, and the application workload. The fault criticality assessment and mitigation techniques presented in this monograph are necessary for enabling low-cost test, diagnosis, and design of robust AI accelerators.

DOI:10.1561/3500000011
ISBN: 978-1-63828-240-2
148 pp. $95.00
Buy book (pb)
 
ISBN: 978-1-63828-241-9
148 pp. $150.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Advances in Robustness Analysis of Von-Neumann Systolic Array-based Accelerators
3. Graph Convolutional Network (GCN) for Criticality Evaluation
4. Neural Twin-driven Robustness Analysis
5. Advances in Testing of Von-Neumann Systolic Array-based Accelerators
6. Robustness of Near-Memory Computing Paradigm
7. Testing and Robustness for Compute-in-Memory AI Accelerators
8. Conclusion
Acknowledgements
References

Recent Advances in Testing Techniques for AI Hardware Accelerators

The rapid growth in big data from mobile, Internet of things (IoT), and edge devices, and the continued demand for higher computing power, have established deep learning as the cornerstone of most artificial intelligence (AI) applications today. Recent years have seen a push towards deep learning implemented on domain-specific AI accelerators that support custom memory hierarchies, variable precision, and optimized matrix multiplication. Commercial AI accelerators have shown superior energy and footprint efficiency compared to GPUs for a variety of inference tasks.

In this monograph, roadblocks that need to be understood and analyzed to ensure functional robustness in emerging AI accelerators are discussed. State-of-the-art practices adopted for structural and functional testing of the accelerators are presented, as well as methodologies for assessing the functional criticality of hardware faults in AI accelerators for reducing the test time by targeting the functionally critical faults.

This monograph highlights recent research on efforts to improve test and reliability of neuromorphic computing systems built using non-volatile memory (NVM) devices like spin-transfer-torque (STT-MRAM) and resistive RAM (ReRAM) devices. Also are the robustness of silicon-photonic neural networks and the reliability concerns with manufacturing defects and process variations in monolithic 3D (M3D) based near-memory computing systems.

 
ICS-011