APSIPA Transactions on Signal and Information Processing > Vol 14 > Issue 1

Scene Understanding by Fused Hu’s Invariant Moments and Deep Learning Features

Michael Nachipyangu, Northwestern Polytechnical University, China, michael.nachipyangu@mail.nwpu.edu.cn , Jiangbin Zheng, Northwestern Polytechnical University, China
 
Suggested Citation
Michael Nachipyangu and Jiangbin Zheng (2025), "Scene Understanding by Fused Hu’s Invariant Moments and Deep Learning Features", APSIPA Transactions on Signal and Information Processing: Vol. 14: No. 1, e21. http://dx.doi.org/10.1561/116.20250007

Publication Date: 13 Aug 2025
© 2025 M. Nachipyangu and J. Zheng
 
Subjects
Object and scene recognition,  Classification and prediction,  Deep learning
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 41 times

In this article:
Introduction 
Related Work 
Proposed Method 
Experiments and Results Discussion 
Conclusion 
References 

Abstract

Convolutional neural networks (CNN) are widely used in the recognition and classification of scene images due to their effectiveness in this task. However, their applicability is not quite as favorable when used with variations of parameters such as rotation, scaling, and translation in input data. To overcome this drawback, this study presents a feature fusion technique that combines Hu moments with deep learning features derived from the CNN model. Hu’s moments of an image are statistical values obtained based on the intensities of the image pixels that are invariant to geometric transformations. These moments are then combined with the features of the fully connected layer of the CNN model, making the proposed method more accurate and robust. The study also utilizes data augmentation, specifically geometrical transformations such as rotating, scaling, flipping, and translation to balance class image distribution in training datasets and reduce interclass bias resulting from the imbalance in number of images within different classes. The fused feature representation was evaluated on three benchmark datasets: MIT67, AID and Scene15. Detailed experiments with different CNN models were conducted, and Inception- ResNetV2 as deep feature extractor combined with Hu Moments demonstrated the effectiveness of the proposed approach which delivers significant improvements in accuracy scores, Scene15: 96.4%, AID: 94.1% and MIT67: 87.1%. This result presents a novel avenue approach for enhancing the resilience and accuracy of Scene Understanding.

DOI:10.1561/116.20250007