APSIPA Transactions on Signal and Information Processing > Vol 12 > Issue 2

Deep Learning for Human Action Recognition: A Comprehensive Review

Duc-Quang Vu, Department of Computer Science and Information Engineering, National Central University, Taiwan and Thai Nguyen University of Education, Vietnam, Trang Phung Thi Thu, Thai Nguyen University, Vietnam, Ngan Le, Department of Computer Science and Computer Engineering, University of Arkansas, USA, Jia-Ching Wang, Department of Computer Science and Information Engineering, National Central University, Taiwan, jcw@csie.ncu.edu.tw
 
Suggested Citation
Duc-Quang Vu, Trang Phung Thi Thu, Ngan Le and Jia-Ching Wang (2023), "Deep Learning for Human Action Recognition: A Comprehensive Review", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 2, e12. http://dx.doi.org/10.1561/116.00000068

Publication Date: 24 Apr 2023
© 2023 D.Q. Vu, T.P.T. Thu, N. Le and J.C. Wang
 
Subjects
 
Keywords
Action recognitionsupervised learningself-supervised learningdeep learningdeep neural networks
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 494 times

In this article:
Introduction 
Human Action Recognition: Problem Definition and Challenges 
Background 
Action Recognition Techniques 
Datasets and Metrics 
Discussion 
Conclusion 
References 

Abstract

Over the past several years, we have witnessed remarkable progress in numerous computer vision applications, particularly in human activity analysis. Human action recognition, which aims to automatically examine and recognize the actions taking place in the video, has been widely applied in many applications. This paper presents a comprehensive survey of approaches and techniques in deep learning-based human activity analysis. First, we introduce the problem definition in action recognition together with its challenges. Second, we provide a comprehensive survey of feature representation methods. Third, we categorize human activity methodologies and discuss their advantages and limitations. In particular, we divide human action recognition into three main categories according to training mechanisms, i.e., supervised learning, semi-supervised learning, and self-supervised learning. We further analyze the existing network architectures, their performance, and source code availability for each main category. Fourth, we provide a detailed analysis of the existing, publicly available datasets, including small-scale and large-scale datasets for human action recognition. Finally, we discuss some open issues and future research directions.

DOI:10.1561/116.00000068

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Learning, Security, AIoT for Emerging Communication/Networking Systems
See the other articles that are part of this special issue.