结合边缘增强的全局自注意力遥感建筑物提取网络

展开
  • 1. 首都师范大学资源环境与旅游学院,北京 100048

    2. 北京四维远见信息技术有限公司,北京 100070

    3. 中国科学院空天信息创新研究院,北京 100094

    4. 北京工业职业技术学院,北京 100144

    5. 华北水利水电大学,郑州 450045

    6. 中关村科学城城市大脑股份有限公司,北京 100081

李振,博士研究生,研究方向为遥感影像智能解译,电子邮箱:sdlz123@126.com
刘先林(通信作者),研究员,中国工程院院士,研究方向为测绘、摄影测量、遥感的理论与设备研究,电子邮箱:liuxl@cae.cn

收稿日期: 2024-01-03

  修回日期: 2024-09-26

  网络出版日期: 2025-01-07

基金资助

国家重点研发计划项目(2022YFB3903602);

北京工业职业技术学院校立课题(BGY2022KY-06QT)

Global self-attention remote sensing building extraction networkcombined with edge enhancement

Expand
  • 1. College of Resource Environment and Tourism, Capital Normal University, Beijing 100048, China

    2. Beijing Geo-Vision Information Technology Co., Ltd., Beijing 100070, China

    3. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

    4. Beijing Polytechnic College,Beijing 100144,China

    5. North China University of Water Resources and Electric Power, Zhengzhou 450045, China

    6. Zhongguancun Smart City Co., Ltd., Beijing 100081, China

Received date: 2024-01-03

  Revised date: 2024-09-26

  Online published: 2025-01-07

摘要

准确高效地对遥感影像进行建筑物提取是城市精细化管理、高精度制图、土地资源调查等应用的基础,研究如何利用影像特点进行智能解译是必要的。设计了一种结合边缘增强的全局自注意力深度学习网络(global self-attention network with edge-enhancement,EGSANet),用于遥感影像建筑物提取,在编码主干构建并融入边缘增强模块,为网络赋予边界先验知识信息;通过全局自注意力特征表达模块构建影像的长距离依赖关系,实现显著特征与边缘增强特征的表达融合;使用逐级上采样解码模块,将空间细节信息丰富的浅层特征与具有高阶语义信息的深层特征相融合,得到建筑物的精确提取结果。基于2个开源的遥感建筑物数据集,将 E-GSANet与当前主流方法进行对比,定量和定性分析表明,E-GSANet在各项指标中都取得了最优的结果,提取出的建筑物更为完整,边缘更加精确,精度更高。此外,网络结构的消融实验分析证明了各模块的有效性。

本文引用格式

李振, 张振鑫, 王涛, 彭雪丽, 岳贵杰, 张德宇, 刘先林, 李建华 . 结合边缘增强的全局自注意力遥感建筑物提取网络[J]. 科技导报, 0 : 1 . DOI: 10.3981/j.issn.1000-7857.2024.01.00025

Abstract

The accurate and efficient extraction of building from remote sensing images is fundamental for applications such as fine urban management, high-precision mapping, and land resource investigation. It is essential to investigate how to leverage image features for intelligent interpretation. This study introduces a global self-attention network with edge-enhancement (EGSANet) for remote sensing building extraction. The network integrate the edge enhancement module into the encoder backbone,providing the network with a priori knowledge about boundaries, and then establish long-distance dependency relationships between features using the global self-attention feature expression module, enabling the fusion of salient features with edge enhanced features. A stepwise up-sampling decoding module is designed to fusing the shallow features with rich spatial detail information and the deep features with high-order semantic information to obtain accurate extraction results of buildings. The comparison experiments between E-GSANet and the current mainstream methods is conducted based on two open-source remote sensing building datasets. The quantitative analysis and qualitative demonstrations prove that E-GSANet achieves optimal result sacross all evaluation metrics, yielding more complete building extractions, precise edges, and higher accuracy. Additionally,network structure ablation experiments and analysis demonstrate the effectiveness of each module.

参考文献

[1] Li D R, Wang M, Jiang J. China's high-resolution optical remote sensing satellites and their mapping applications[J]. Geo-spatial Information Science, 2021, 24(1): 85-94.

[2] 龚健雅, 张展, 贾浩巍, 等. 面向多源数据地物提取的遥感知识感知与多尺度特征融合网络[J]. 武汉大学学报(信息科学版), 2022, 47(10): 1546-1554.

[3] Hu S G, Wang L. Automated urban land-use classification with remote sensing[J]. International Journal of Re⁃mote Sensing, 2013, 34(3): 790-803.

[4] Ural S, Hussain E, Shan J. Building population mapping with aerial imagery and GIS data[J]. International Journal of Applied Earth Observation and Geoinformation, 2011,13(6): 841-852.

[5] Goetz S J, Smith A J, Jantz C, et al. Monitoring and predicting urban land use change[C]//Proc. IEEE International, Geoscience and Remote Sensing Symposium,IGARSS’03. Toulouse, France: IEEE, 2003, 3: 1567-1569.

[6] Peng J, Liu Y C. Model and context-driven building extraction in dense urban aerial images[J]. International Journal of Remote Sensing, 2005, 26(7): 1289-1307.

[7] Inglada J. Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2007, 62(3):236-248.

[8] Cetin M, Halici U, Aytekin Ö. Building detection in satellite images by textural features and Adaboost[C]//Proceedings of IAPR Workshop on Pattern Recognition in Re⁃mote Sensing. Piscataway, NJ: IEEE, 2010: 1-4.

[9] Li E, Femiani J, Xu S B, et al. Robust rooftop extraction from visible band images using higher order CRF[J].IEEE Transactions on Geoscience and Remote Sensing,2015, 53(8): 4483-4495.

[10] Zhang B, Chen Z C, Peng D L, et al. Remotely sensed big data: Evolution in model development for information extraction[point of view[J]. Proceedings of the IEEE,2019, 107(12): 2294-2301.

[11] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241.

[12] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ: IEEE, 2016: 770-778.

[13] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE,2019: 5686-5696.

[14] Bittner K, Adam F, Cui S Y, et al. Building footprint ex⁃traction from VHR remote sensing images combined with normalized DSMs using fused fully convolutional networks[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(8):2615-2629.

[15] Cao Z Y, Fu K, Lu X D, et al. End-to-end DSM fusion networks for semantic segmentation in high-resolution aerial images[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(11): 1766-1770.

[16] Liu Y B, Zhang Z X, Zhong R F, et al. Multilevel building detection framework in remote sensing images based on convolutional neural networks[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2018, 11(10): 3688-3700.

[17] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2023-12-23]. http://arxiv.org/abs/2010.11929

[18] Canny J. A computational approach to edge detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8(6): 679-698.

[19] Ba J L, Kiros J R, Hinton G E. Layer normalization[EB/OL]. [2023-12-23]. https://arxiv.org/abs/1607.06450v1

[20] Ji S P, Wei S Q, Lu M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(1): 574-586.[21] Maggiori E, Tarabalka Y, Charpiat G, et al. Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark[C]//Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IGARSS). Piscataway, NJ: IEEE, 2017: 3226-3229.

[22] Milletari F, Navab N, Ahmadi S A. V-net: Fully convolutional neural networks for volumetric medical image segmentation[EB/OL]. [2023-12-23]. https://arxiv. org/abs/1606.04797v1

[23] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway,NJ: IEEE, 2017: 6230-6239.

[24] Fu J, Liu J, Tian H J, et al. Dual attention network for scene segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway, NJ: IEEE, 2019: 3141-3149.

[25] Li H C, Xiong P F, An J, et al. Pyramid attention network for semantic segmentation[EB/OL]. [2023-12-23].https://arxiv.org/abs/1805.10180v3

[26] Wei S Q, Ji S P, Lu M. Toward automatic building footprint delineation from aerial images using CNN and regularization[J]. IEEE Transactions on Geoscience and Re⁃mote Sensing, 2020, 58(3): 2178-2189.

[27] Liu P H, Liu X P, Liu M X, et al. Building footprint extraction from high-resolution images via spatial residual inception convolutional neural network[J]. Remote Sensing, 2019, 11(7): 830.

[28] Li Q Y, Shi Y L, Huang X, et al. Building footprint generation by integrating convolution neural network with feature pairwise conditional random field (FPCRF) [J].IEEE Transactions on Geoscience and Remote Sensing,2020, 58(11): 7502-7519.[29] Li Z, Zhang Z X, Chen D, et al. HCRB-MSAN: Horizontally connected residual blocks-based multiscale attention network for semantic segmentation of buildings in HSR remote sensing images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 5534-5544.

文章导航

/