Cross-View Image Geo-localization Based on Attention Weight Masks
DOI:
https://doi.org/10.5755/j01.itc.54.3.41802Keywords:
Cross-view image geo-localization, Field of view, Attention mechanism, Feature alignmentAbstract
Cross-view Image Geo-localization is the process of determining the geographic location of a ground-view query image by matching it with geotagged satellite or unmanned aerial vehicle (UAV) captured images. In the context of ground images characterized by a constrained field of view (FoV), the query image exhibits a reduced coverage area, limited scene content, and an unknown imaging direction. Furthermore, reference satellite images from the same location may contain significant feature redundancy. These issues lead to low localization accuracy when existing methods are applied to ground images with a limited FoV. We propose a cross-view image geo-localization method based on attention weight mask alignment. The Coordinate Attention (CA) mechanism, embedded in a lightweight ResNet18 network, generates weight masks to enable precise alignment of limited FoV ground images with satellite image feature maps. This process eliminates redundant areas in satellite images, thereby enhancing localization accuracy. Since feature maps at various levels capture images at different granularities, we introduce a multi-scale feature fusion strategy. It generates more representative image descriptors by combining features from different convolutional layers. Experimental results on the CVUSA and CVACT_val benchmark datasets demonstrate that when the FoV of ground images to be located is 70° and 90° with a random imaging direction, the proposed method significantly improves location accuracy.
Downloads
Published
Issue
Section
License
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.


