APSIPA Transactions on Signal and Information Processing > Vol 9 > Issue 1

DSNet: an efficient CNN for road scene segmentation

Ping-Rong Chen, National Chiao Tung University, Taiwan, Hsueh-Ming Hang, National Chiao Tung University, Taiwan, hmhang@nctu.edu.tw , Sheng-Wei Chan, Industrial Technology Research Institute, Taiwan, Jing-Jhih Lin, Industrial Technology Research Institute, Taiwan
 
Suggested Citation
Ping-Rong Chen, Hsueh-Ming Hang, Sheng-Wei Chan and Jing-Jhih Lin (2020), "DSNet: an efficient CNN for road scene segmentation", APSIPA Transactions on Signal and Information Processing: Vol. 9: No. 1, e27. http://dx.doi.org/10.1017/ATSIP.2020.25

Publication Date: 26 Nov 2020
© 2020 Ping-Rong Chen, Hsueh-Ming Hang, Sheng-Wei Chan and Jing-Jhih Lin
 
Subjects
 
Keywords
Semantic segmentationReal-time CNN segmentationCNN architectureRoad scene segmentation
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 910 times

In this article:
I. INTRODUCTION 
II. PROPOSED NETWORK OVERVIEW 
III. STRUCTURE/COMPONENT SELECTION 
IV. PERFORMANCE OF DSNET 
V. CONCLUSION 

Abstract

Road scene understanding is a critical component in an autonomous driving system. Although the deep learning-based road scene segmentation can achieve very high accuracy, its complexity is also very high for developing real-time applications. It is challenging to design a neural net with high accuracy and low computational complexity. To address this issue, we investigate the advantages and disadvantages of several popular convolutional neural network (CNN) architectures in terms of speed, storage, and segmentation accuracy. We start from the fully convolutional network with VGG, and then we study ResNet and DenseNet. Through detailed experiments, we pick up the favorable components from the existing architectures and at the end, we construct a light-weight network architecture based on the DenseNet. Our proposed network, called DSNet, demonstrates a real-time testing (inferencing) ability (on the popular GPU platform) and it maintains an accuracy comparable with most previous systems. We test our system on several datasets including the challenging Cityscapes dataset (resolution of 1024 × 512) with an Mean Intersection over Union (mIoU) of about 69.1% and runtime of 0.0147 s/image on a single GTX 1080Ti. We also design a more accurate model but at the price of a slower speed, which has an mIoU of about 72.6% on the CamVid dataset.

DOI:10.1017/ATSIP.2020.25