now publishers - DSNet: an efficient CNN for road scene segmentation

APSIPA Transactions on Signal and Information Processing > Vol 9 > Issue 1

DSNet: an efficient CNN for road scene segmentation

Ping-Rong Chen, National Chiao Tung University, Taiwan, Hsueh-Ming Hang, National Chiao Tung University, Taiwan, hmhang@nctu.edu.tw , Sheng-Wei Chan, Industrial Technology Research Institute, Taiwan, Jing-Jhih Lin, Industrial Technology Research Institute, Taiwan

Suggested Citation

Ping-Rong Chen, Hsueh-Ming Hang, Sheng-Wei Chan and Jing-Jhih Lin (2020), "DSNet: an efficient CNN for road scene segmentation", APSIPA Transactions on Signal and Information Processing: Vol. 9: No. 1, e27. http://dx.doi.org/10.1017/ATSIP.2020.25

Publication Date: 26 Nov 2020

Subjects

Keywords

Semantic segmentation, Real-time CNN segmentation, CNN architecture, Road scene segmentation

Journal details

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 1830 times

In this article:

Abstract

Road scene understanding is a critical component in an autonomous driving system. Although the deep learning-based road scene segmentation can achieve very high accuracy, its complexity is also very high for developing real-time applications. It is challenging to design a neural net with high accuracy and low computational complexity. To address this issue, we investigate the advantages and disadvantages of several popular convolutional neural network (CNN) architectures in terms of speed, storage, and segmentation accuracy. We start from the fully convolutional network with VGG, and then we study ResNet and DenseNet. Through detailed experiments, we pick up the favorable components from the existing architectures and at the end, we construct a light-weight network architecture based on the DenseNet. Our proposed network, called DSNet, demonstrates a real-time testing (inferencing) ability (on the popular GPU platform) and it maintains an accuracy comparable with most previous systems. We test our system on several datasets including the challenging Cityscapes dataset (resolution of 1024 × 512) with an Mean Intersection over Union (mIoU) of about 69.1% and runtime of 0.0147 s/image on a single GTX 1080Ti. We also design a more accurate model but at the price of a slower speed, which has an mIoU of about 72.6% on the CamVid dataset.

DOI:10.1017/ATSIP.2020.25

I. INTRODUCTION
II. PROPOSED NETWORK OVERVIEW
III. STRUCTURE/COMPONENT SELECTION
IV. PERFORMANCE OF DSNET
V. CONCLUSION

DSNet: an efficient CNN for road scene segmentation

Share

Journal details

Abstract