Road Scene Semantic Segmentation Network Based on Multi-Scale Transformer Features

Home > Archive>Volume 42, Issue 2, 2025 >110-118

Road Scene Semantic Segmentation Network Based on Multi-Scale Transformer Features
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:School of Intelligent and Connected Vehicle, Hubei University of Automotive Technology, Shiyan 442002 , China
Clc Number:TP391.41；U491.1
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Image contents in road scenes are usually complex, with significant differences in scale and shape between different objects, and lighting and shadows can make the scenes difficult to recognize. However, existing semantic segmentation methods often fail to effectively extract and fully integrate multi-scale semantic features, resulting in poor generalization ability and robustness. To address these issues, this study proposes a semantic segmentation network model that fuses multi-scale Transformer features. Firstly, the CSWin Transformer was employed to extract semantic features at various scales, accompanied by the introduction of a feature refinement module (FRM) to enhance the semantic discrimination capability of deep, fine-grained features. Secondly, an attention aggregation module (AAM) was adopted to separately aggregate features across scales. Finally, by integrating these enhanced multi-scale features, the semantic expression ability of the features was further enhanced,thereby improving segmentation performance. Experimental results demonstrate that this network model achieves an accuracy of 82.3% on the Cityscapes dataset, outperforming SegNeXt and ConvNeXt by 2.2 percentage points and 1.2 percentage points, respectively. Moreover, it attains an accuracy of 47.4% on the highly challenging ADE20K dataset, surpassing SegNeXt and ConvNeXt by 3.2 percentage points and 2.8 percentage points, respectively. The proposed multi-scale Transformer feature fusion model not only achieves high semantic segmentation accuracy, accurately predicting pixel semantic categories of road scene images, but also has strong generalization performance and robustness.

Reference

Cited by

Get Citation

彭洋,吴文欢,张淏坤. 基于多尺度Transformer特征的道路场景语义分割网络[J]. 华东交通大学学报,2025,42 (2):110-118.

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 14,2024
Revised:
Adopted:
Online: May 16,2025
Published:

Home

About Journal

Editorial Board

Guidelines

Review Guideline

Download

Contact Us

中文

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code