A GF-3 SAR Image Dataset of Road Segmentation

We constructed a GF-3 SAR image dataset based on road segmentation to boost the development of GF-3 synthetic aperture radar (SAR) image road segmentation technology and make GF-3 SAR images be applied to practice better. We selected 23 scenes of GF-3 SAR images in Shaanxi, China, cut them into road chips with 512 × 512 pixels, and then labeled the dataset using LabelMe labeling tool. The dataset consists of 10026 road chips, and these road images are from different GF-3 imaging modes, so there is diversity in resolution and polarization. Three segmentation algorithms such as Multi-task Network Cascades (MNC), Fully Convolutional Instance-aware Semantic Segmentation (FCIS), and Mask Region Convolutional Neural Networks (Mask R-CNN) are trained by using the dataset. The experimental result measures including Average Precision (AP) and Intersection over Union (IoU) show that segmentation algorithms work well with this dataset, and the segmentation accuracy of Mask R-CNN is the best, which demonstrates the validity of the dataset we constructed.


Introduction
Synthetic aperture radar (SAR) is a microwave sensor that uses an active mode of operation. The radar sensor itself emits a pulse of energy to the ground, and at the same time receives signals scattered from the ground surface for ground detection. Due to the active imaging mode, SAR emits microwaves that can penetrate clouds and fogs, so it has the ability to obtain ground information all day and all weather. Therefore, As road segmentation in SAR images is very important for national economy and people's livelihood, such as transport system, urban development, residential life, and industrial distribution [19], it has been a research hotspot in the field of SAR image interpretation. Yang et al. addressed the fusion of image and point cloud data for road detection [45], An proposed a method for extracting roads in complex scenes using the optimized Hough algorithm [1], Fu et al. proposed high-resolution remote sensing images road extraction based on wavelet transform and Hough transform [15], Geman et al. presented a new approach to tracking roads from satellite images [17], Tsutsui et al. presented an approach to road segmentation that only requires image-level annotations at training time [39], and Cheng et al. proposed a novel method of fusing geometric and appearance cues for road surface segmentation [5]. However, SAR imaging is not the same as optical imaging, and its characterization is not intuitive. The phenomena such as overlays during imaging will interfere with the target interpretation, and due to the interference of speckle, bright and dark grainy noise appears in images, which seriously affects the interpretation of images and the extraction of road features [47]. In deep learning, there is less research on road segmentation because road segmentation is much more difficult than road detection. The aim of road detection is to draw a border around the road, predict its label, and only output the presence or absence of the road and position information; while the road segmentation outputs the position information of the road, predicts the contour and area of the road, and further outputs the shape of the road.
With the development of deep learning technology, there have been many segmentation algorithms us-ing deep neural network models. Commonly used network models are convolutional neural networks. Basic networks such as AlexNet [24], VGG [35], GoogleNet [38], and ResNet [21] have successively appeared, and segmentation models based on this structure have appeared, including Multi-task Network Cascades (MNC) [10], Fully Convolutional Instance-aware Semantic Segmentation (FCIS) [28], and Mask Region Convolutional Neural Networks (Mask R-CNN) [20]. MNC is mainly the application of multi-task learning in the case perception segmentation. FCIS is the first full convolution, end-to-end solution for image instance segmentation task. Without using any skills, Mask R-CNN is superior to all existing single-model networks, and it is recognized as a flexible, efficient, and universal segmentation architecture at present [20]. Deep learning methods often require a large amount of training sample data as a support. There are more public sample datasets in computer vision, such as ImageNet [11], PASCAL VOC [14], and COCO [27], and the data size reaches thousands of objects and millions of chips. In the field of optical remote sensing, the public datasets mainly include NWPU-RESISC45 [3], DIOR [25], NWPU VHR-10 [2,4,6], DOTA [43], HRRSD [31], and RSOD [30,44]. In the research field of SAR image, there are mainly the AIR-SARShip-1.0 [36] dataset and the dataset mentioned in [41]. Above datasets have facilitated numerous studies. However, due to the extremely high accuracy required for sample labeling in road segmentation, public and free datasets of road segmentation for SAR images are very scarce, which seriously affects the development of deep learning technology of road segmentation for GF-3 SAR images. Only by constructing a set of road sample databases suitable for road segmentation can we promote research on SAR road segmentation and make GF-3 SAR images better serve the national road planning, urban construction and other aspects [29,42].
A GF-3 SAR image dataset of road segmentation is constructed in this paper, named SARroad. This dataset contains 10026 image chips, which are from 23 scenes of GF-3 SAR images with size of 512 × 512 pixels each chip. The imaging modes include Spotlight (SL), Ultra-Fine Strip (UFS), Fine Strip I (FSI), and Fine Strip II (FSII), and the corresponding resolution is 1m, 3m, 5m, and 10m, respectively. The three segmentation algorithms of MNC, FCIS, and Mask

Methodology
As shown in Figure 1, we firstly selected 23 original GF-3 SAR images, cut them into 10026 road chips with 512 × 512 pixels, then we used the LabelMe label-

Figure 1
Methodological steps ing tool to segment and label the roads, and finally we constructed the dataset. We used the training dataset to train MNC, FCIS, and Mask R-CNN, and then used the testing dataset to test these three networks. We used a combination of quantitative analysis and qualitative analysis to analyze the experimental results in detail. Furthermore, we evaluated the accuracy of the three models, and the segmentation accuracy of Mask R-CNN is the best. Finally, we formed a benchmark through experimental comparison analysis.
combination of quantitative analysis and qualitative analysis to analyze the experimental results in detail. Furthermore, we evaluated the accuracy of the three models, and the segmentation accuracy of Mask R-CNN is the best. Finally, we formed a benchmark through experimental comparison analysis.

Figure 1
Methodological steps.  Table 1 gives the detailed information of the GF-3 SAR image dataset. The images have resolutions of 1m, 3m, 5m, and 10m, so there is a big difference among the images. Figure 2 is a GF-3 SAR image in SL mode, with various terrains of road, lake, farmland, and town. Lakes usually have lower gray values due to specular scattering, towns usually have higher gray values due to dihedral or trihedral scattering, and roads have the gray values in between.

Construction Strategy of Road Segmentation Dataset
The construction process of the dataset is shown in Figure 3.
Firstly, 23 scenes of original GF-3 SAR images have a size of pixels around 13200 × 24300. Sub-images containing roads in GF-3 SAR images are selected. The sub-images should contain clear road edges and moderate background complexity, with various imaging modes.
Secondly, we developed an image cropping software with Python and Open Source Computer Vision Library (OpenCV). When we input the original SAR image, the software can crop the original image with the size predefined, and the cropped image is not distorted. The software's crop function is not affected by the GF-3 SAR image in SL mode ensure that enough data (70%) is used as the training dataset to avoid overfitting of road segmentation in deep learning. We selected a small amount of data (20%) as the validation dataset to adjust the model and selected the fewest data (10%) as the test dataset to evaluate the model indicators.

Figure 2
GF-3 SAR image in SL mode.

Figure 3
The construction process of the dataset proposed as open public repository.

Figure 4
Road labeling details: (a) example of road image; (b) labeled image; (c) JSON file after labeling. Table 1 provides details on the road chips in different imaging modes. Figure 5 shows the road chips in different imaging modes. Figure  6 shows the roads with different shapes in the same imaging mode (SL), including airport runway, highway, country road, fork road, cross road, and curved road. The selection criteria of the road segment type used in our research are that the roads have a high probability of appearance, different shapes, and a large degree of difference. The background of roads in the dataset is also very diverse, which is shown in Figure 7.
Due to various shapes and backgrounds of roads in the SARroad dataset, the overfitting of road segmentation in deep learning can be avoided effectively. resolution and size of the input image. Since we have set the size of the cropped image chip to 512 × 512 pixels in the software, a 512 × 512 pixels window is slid along the sub-image's row with a step of 256 pixels, and then the road chips are cut with a size of 512 × 512 pixels. The chips contain roads with various shapes as well as related background information such as farmland, houses, airports, green belts and cities, which meets the requirements of actual road segmentation. The 512 × 512 pixels size images can be used to clearly extract the type, shape and background information of the roads.
Thirdly, we use the LabelMe labeling tool to segment and label the roads. In the labeling process, the road edge is labeled with a series of points. After labeling, each road image corresponds to a JSON file, such as Figure 4(c). Since the shape of the roads in our dataset is rich, the average time to label a road chip is about 1 minute. We used a total of about 167 working hours to label 10026 road chips with the LabelMe labeling tool. The operating system of the labeling computer is Windows 10, with Intel Core i7-9750H CPU.
As shown in Figure 4(b), there are seven marked points, which are denoted as 1 2 3 4 5 6 7. Taking the upper left corner of the image as the coordinate origin, the horizontal axis is the X axis, and the vertical axis is the Y axis. The coordinate value of each labeled point is its actual pixel position in the image.
Finally, the entire dataset is randomly divided into a training dataset (70%), a validation dataset (20%), and a test dataset (10%). Table 2 gives the allocation details. Since our dataset is tens of thousands of lev-

Figure 3
The construction process of the dataset proposed as open public repository ensure that enough data (70%) is used as the training dataset to avoid overfitting of road segmentation in deep learning. We selected a small amount of data (20%) as the validation dataset to adjust the model and selected the fewest data (10%) as the test dataset to evaluate the model indicators.

Figure 2
GF-3 SAR image in SL mode.

Figure 3
The construction process of the dataset proposed as open public repository.            Table 1 provides details on the road chips in different imaging modes. Figure 5 shows the road chips in different imaging modes. Figure 6 shows the roads with different shapes in the same imaging mode (SL), including airport runway, highway, country road, fork road, cross road, and curved road. The selection criteria of the road segment type used in our research are that the roads have a high probability of appearance, different shapes, and a large degree of difference. The background of roads in the dataset is also very diverse, which is shown in Figure 7.

Details of Dataset
Due to various shapes and backgrounds of roads in the SARroad dataset, the overfitting of road segmentation in deep learning can be avoided effectively.   Table 1 provides details on the road chips in different imaging modes. Figure 5 shows the road chips in different imaging modes. Figure  6 shows the roads with different shapes in the same imaging mode (SL), including airport runway, highway, country road, fork road, cross road, and curved road. The selection criteria of the road segment type used in our research are that the roads have a high probability of appearance, different shapes, and a large degree of difference. The background of roads in the dataset is also very diverse, which is shown in Figure 7.

Details of Dataset
Due to various shapes and backgrounds of roads in the SARroad dataset, the overfitting of road segmentation in deep learning can be avoided effectively.   Table 1 provides details on the road chips in different imaging modes. Figure 5 shows the road chips in different imaging modes. Figure  6 shows the roads with different shapes in the same imaging mode (SL), including airport runway, highway, country road, fork road, cross road, and curved road. The selection criteria of the road segment type used in our research are that the roads have a high probability of appearance, different shapes, and a large degree of difference. The background of roads in the dataset is also very diverse, which is shown in Figure 7.

Details of Dataset
Due to various shapes and backgrounds of roads in the SARroad dataset, the overfitting of road segmentation in deep learning can be avoided effectively. test dataset (10%). Table 2 gives the n details. Since our dataset is tens of ds of levels, we used the 7: 2: 1 ratio to he entire dataset into a training dataset, a n dataset, and a test dataset. We need to that enough data (70%) is used as the dataset to avoid overfitting of road ation in deep learning. We selected a small of data (20%) as the validation dataset to he model and selected the fewest data s the test dataset to evaluate the model rs.
R image in SL mode. struction process of the dataset proposed public repository.  Table 1 provides details on the road chips in different imaging modes. Figure 5 shows the road chips in different imaging modes. Figure  6 shows the roads with different shapes in the same imaging mode (SL), including airport runway, highway, country road, fork road, cross road, and curved road. The selection criteria of the road segment type used in our research are that the roads have a high probability of appearance, different shapes, and a large degree of difference. The background of roads in the dataset is also very diverse, which is shown in Figure 7.

Details of Dataset
Due to various shapes and backgrounds of roads in the SARroad dataset, the overfitting of road segmentation in deep learning can be avoided effectively. est dataset (10%). Table 2 gives the details. Since our dataset is tens of s of levels, we used the 7: 2: 1 ratio to e entire dataset into a training dataset, a dataset, and a test dataset. We need to at enough data (70%) is used as the dataset to avoid overfitting of road tion in deep learning. We selected a small f data (20%) as the validation dataset to e model and selected the fewest data the test dataset to evaluate the model .
image in SL mode.
truction process of the dataset proposed ublic repository.  Table 1 provides details on the road chips in different imaging modes. Figure 5 shows the road chips in different imaging modes. Figure  6 shows the roads with different shapes in the same imaging mode (SL), including airport runway, highway, country road, fork road, cross road, and curved road. The selection criteria of the road segment type used in our research are that the roads have a high probability of appearance, different shapes, and a large degree of difference. The background of roads in the dataset is also very diverse, which is shown in Figure 7.

Details of Dataset
Due to various shapes and backgrounds of roads in the SARroad dataset, the overfitting of road segmentation in deep learning can be avoided effectively.   Figure 8 shows the network MNC. MNC is mainly comp phases [10]. The first stage ta the road region of interest (ROI road features extracted by the neural network. The second s ROI Warping and ROI Poolin preliminary road physical mas fully-connected layers. The th uses the two-layer fully-connec further determine the road mas second stage road segmentation Different from the FCIS and methods, the biggest feature segmentation method. On the b features, each task is simultaneously, and each ta interfere with each other. On shared features, the next task d previous task and shared featu hierarchical multi-tasking struc Figure 9 shows the network stru FCIS is mainly composed of th One is the convolutional neura for extracting feature maps. ROI region generation netwo ROI Warping and ROI Pool corresponding features and and background. The third is f layer part, which is used background division to gene mask.

FCIS
The preliminary features of ro extracted by the convolution l these features are used to ex Roads with different shapes in SL mode (resolution of 1m): (a) airport runway; (b) highway; (c) country road; (d) fork road; (e) cross road; (f) curved road.   Figure 9 shows the network stru FCIS is mainly composed of th One is the convolutional neural for extracting feature maps. T ROI region generation networ ROI Warping and ROI Poolin corresponding features and g and background. The third is fu layer part, which is used f background division to gener mask.

FCIS
The preliminary features of ro extracted by the convolution la these features are used to ext road; (f) curved road.

Segmentation Algorithms
There are three representative segmentation algorithms of MNC, FCIS, and Mask R-CNN [16] ads with different shapes in SL mode esolution of 1m): (a) airport runway; (b) ghway; (c) country road; (d) fork road; (e) cross ad; (f) curved road. segmentation algorithms in computer vision have similar basic networks, such as FPN [26] and ResNet. In this paper, ResNet50 is selected as the basic network for road feature extraction. The MNC, FCIS, and Mask R-CNN are briefly introduced respectively in the following Sections. Figure 8 shows the network structure of MNC. MNC is mainly composed of three phases [10]. The first stage task determines the road region of interest (ROI) based on the road features extracted by the convolutional neural network. The second stage task uses ROI Warping and ROI Pooling and forms a preliminary road physical mask through two fully-connected layers. The third stage task uses the two-layer fully-connected network to further determine the road mask based on the second stage road segmentation.

MNC
Different from the FCIS and Mask R-CNN methods, the biggest feature of MNC is its segmentation method. On the basis of shared features, each task is performed simultaneously, and each task does not interfere with each other. On the basis of shared features, the next task depends on the previous task and shared features, forming a hierarchical multi-tasking structure. Figure 9 shows the network structure of FCIS. FCIS is mainly composed of three parts [28]. One is the convolutional neural network part for extracting feature maps. The second is ROI region generation network part, using ROI Warping and ROI Pooling to extract corresponding features and generate road and background. The third is fully-connected layer part, which is used for road and background division to generate the final mask.

FCIS
The preliminary features of road image are extracted by the convolution layer, and then these features are used to extract the ROI highway; (c) country road; (d) fork road; (e) cross road; (f) curved road.

Segmentation Algorithms
There are three representative segmentation algorithms of MNC, FCIS, and Mask R-CNN [16],

Segmentation Algorithms
There are three representative segmentation algorithms of MNC, FCIS, and Mask R-CNN [16],

Segmentation Algorithms
and ResNet. In this paper, ResNet selected as the basic network for road f extraction. The MNC, FCIS, and Ma CNN are briefly introduced respectiv the following Sections. Figure 8 shows the network structu MNC. MNC is mainly composed of phases [10]. The first stage task deter the road region of interest (ROI) based o road features extracted by the convolu neural network. The second stage task ROI Warping and ROI Pooling and fo preliminary road physical mask throug fully-connected layers. The third stag uses the two-layer fully-connected netw further determine the road mask based second stage road segmentation.

MNC
Different from the FCIS and Mask R methods, the biggest feature of MNC segmentation method. On the basis of s features, each task is perfo simultaneously, and each task doe interfere with each other. On the ba shared features, the next task depends o previous task and shared features, form hierarchical multi-tasking structure. Figure 9 shows the network structure of FCIS is mainly composed of three part One is the convolutional neural networ for extracting feature maps. The seco ROI region generation network part, ROI Warping and ROI Pooling to e corresponding features and generate and background. The third is fully-conn layer part, which is used for road background division to generate the mask.

FCIS
The preliminary features of road imag extracted by the convolution layer, and these features are used to extract the region through the RPN (Region Pro Network) network, and to generate a f map through the convolution layers. the generated ROI region and feature are aggregated, each ROI and corresponding feature map are sent t highway; (c) country road; (d) fork road; (e) cross road; (f) curved road.

Segmentation Algorithms
and ResNet. In this paper, ResNet5 selected as the basic network for road fe extraction. The MNC, FCIS, and Mas CNN are briefly introduced respective the following Sections. Figure 8 shows the network structur MNC. MNC is mainly composed of phases [10]. The first stage task determ the road region of interest (ROI) based o road features extracted by the convolut neural network. The second stage task ROI Warping and ROI Pooling and for preliminary road physical mask through fully-connected layers. The third stage uses the two-layer fully-connected netwo further determine the road mask based o second stage road segmentation.

MNC
Different from the FCIS and Mask Rmethods, the biggest feature of MNC segmentation method. On the basis of sh features, each task is perfo simultaneously, and each task does interfere with each other. On the bas shared features, the next task depends o previous task and shared features, form hierarchical multi-tasking structure. Figure 9 shows the network structure of FCIS is mainly composed of three parts One is the convolutional neural network for extracting feature maps. The secon ROI region generation network part, u ROI Warping and ROI Pooling to ex corresponding features and generate and background. The third is fully-conn layer part, which is used for road background division to generate the mask.

FCIS
The preliminary features of road imag extracted by the convolution layer, and these features are used to extract the region through the RPN (Region Pro Network) network, and to generate a fe map through the convolution layers. the generated ROI region and feature are aggregated, each ROI and corresponding feature map are sent to

Segmentation Algorithms
There are three representative segmentation algorithms of MNC, FCIS, and Mask R-CNN [16], the most famous of which is Mask R-CNN. Above

Segmentation Algorithms
There are three representative segmentation algorithms of MNC, FCIS, and Mask R-CNN [16], the most famous of which is Mask R-CNN. Above

Segmentation Algorithms
There are three representative segmentation algorithms of MNC, FCIS, and Mask R-CNN [16], the most famous of which is Mask R-CNN. Above segmentation algorithms in computer vision have similar basic networks, such as FPN [26] and ResNet. In this paper, ResNet50 is selected as the basic network for road feature extraction. The MNC, FCIS, and Mask R-CNN are briefly introduced respectively in the following Sections.
Information Technology and Control 2021/1/50 Figure 8 shows the network structure of MNC. MNC is mainly composed of three phases [10]. The first stage task determines the road region of interest (ROI) based on the road features extracted by the convolutional neural network. The second stage task uses ROI Warping and ROI Pooling and forms a preliminary road physical mask through two fully-connected layers. The third stage task uses the two-layer fully-connected network to further determine the road mask based on the second stage road segmentation.

MNC
Different from the FCIS and Mask R-CNN methods, the biggest feature of MNC is its segmentation method. On the basis of shared features, each task is performed simultaneously, and each task does not interfere with each other. On the basis of shared features, the next task depends on the previous task and shared features, forming a hierarchical multi-tasking structure. Figure 9 shows the network structure of FCIS. FCIS is mainly composed of three parts [28]. One is the convolu-

Results
Based on performance deep learni verify the ef road segmen The operati machine is NVIDIA 208 In deep lear to reflect the correctly pre reflect the p samples am calculation where TP re frames who true value is of detection true but th represents t whose detec value is tru reflects the t recognition and its abilit is defined a recall curve and y-axis relatively di often calcul measured p higher the A performance reflects the the mask pr label frame where P B is the road lab parts of the network: the RPN network part using the convolutional neural network to extract the feature map, the network part to generate the target classification using regional proposals, and the network part for semantic segmentation and mask generation [20].
Firstly, images are fed into a deep convolutional network to obtain a feature map, and then a set of rectangular target frames and their corresponding target scores are obtained using RPN. After that, the region of interest is further processed using the method of ROI. Finally, these converted proposed regions are passed to the classifier to output the bounding boxes of the corresponding roads, while the semantic segmentation network part generates road masks in parallel.  Mask R-CNN network structure.

Analysis Results
Based on our performance of deep learning verify the effect road segmentati The operating machine is Ubu NVIDIA 2080tiG In deep learning to reflect the co correctly predic reflect the prop samples among calculation meth where TP repre frames whose d true value is tru of detection fram true but the tr represents the whose detection value is true [ reflects the trad recognition acc and its ability to is defined as th recall curve (PR and y-axis is relatively difficu often calculated measured preci higher the AP v performance. Io The second is ROI region generation network part, using ROI Warping and ROI Pooling to extract corresponding features and generate road and background. The third is fully-connected layer part, which is used for road and background division to generate the final mask. The preliminary features of road image are extracted by the convolution layer, and then these features are used to extract the ROI region through the RPN (Region Proposal Network) network, and to generate a feature map through the convolution layers. After the generated ROI region and feature map are aggregated, each ROI and its corresponding feature map are sent to the final segmentation discriminator for classification to obtain an accurate road mask. Figure 10 shows the network structure of Mask R-CNN. Mask R-CNN is mainly composed of three parts of the network: the RPN network part using the convolutional neural network to extract the feature map, the network part to generate the target classification using regional proposals, and the network part for semantic segmentation and mask generation [20].

Mask R-CNN
Firstly, images are fed into a deep convolutional network to obtain a feature map, and then a set of rectangular target frames and their corresponding target scores are obtained using RPN. After that, the region of interest is further processed using the method of ROI. Finally, these converted proposed regions are passed to the classifier to output the bounding boxes of the corresponding roads, while the semantic segmentation network part generates road masks in parallel.

Analysis of Experimental Results
Based on our dataset, we compare the performance of the three aforementioned deep learning segmentation algorithms to verify the effect of deep learning methods on road segmentation and give specific analysis. The operating system of the experimental machine is Ubuntu 16.04, and the GPU is NVIDIA 2080tiGPU.
In deep learning, precision ( p ) is often used to reflect the correct rate of a category being correctly predicted, and recall ( r ) is used to reflect the proportion of correctly predicted samples among all predicted samples. The calculation method is shown in (1) and (2), where TP represents the number of detection frames whose detection result is true and the true value is true, FP represents the number of detection frames whose detection result is true but the true value is false, and FN represents the number of detection frames whose detection result is false but the true value is true [36].  sion-recall curve (PR curve). The x-axis is recall, and y-axis is precision. However, due to relatively difficulty of such integration, AP is often calculated by (3), where ( ) p r  is the measured precision at recall r [14,30]. The higher the AP value, the better the algorithm performance. IoU (Intersection over Union) reflects the degree of area overlap between the mask predicted by the algorithm and the label frame [22], and it is calculated by (4), where P B is the predicted road mask, gt B is the road label frame, is the area of the intersection of the predicted and ground truth bounding boxes, and is the area of their union. If the prediction is perfect, 1 IoU = , and if it completely misses, 0 IoU = . The higher the IoU value, the better the algorithm performance.
When IoU is greater than 0.5, the detection is considered to be successful, which is marked as TP . When IoU is smaller than 0.5, a false alarm is considered to appear, which is marked as FP . Since this dataset only contains roads, mAP (average value of AP for all classes) is same as AP itself.
of the intersection of the predicted and ground truth bounding boxes, and ) is the area of their union. If the prediction is perfect, 1 IoU = , and if it completely misses, 0 IoU = . The higher the IoU value, the better the algorithm performance.
When IoU is greater than 0.5, the detection is considered to be successful, which is marked as TP . When IoU is smaller than 0.5, a false alarm is considered to appear, which is marked as FP . Since this dataset only contains roads, mAP (average value of AP for all classes) is same as AP itself.
We train the models using the images from the training dataset, and adjust the models using the images from the validation dataset. After these two steps are finished, we finally test the performance of the models using the images from the test dataset. The testing results of each segmentation model are shown in Table 3, where each algorithm is measured by AP value and IoU. When IoU is greater than 0.5, the classification is considered to be correct. The input image size is 512 × 512 pixels. It can be seen that the performance of Mask R-CNN is the best, while the performance of MNC is the worst. One of the reasons is that, in Mask R-CNN, RoI Pooling is replaced by RoIAlign, leading to more accurate features extracted [20]. Table 3 Performance benchmarking of three deep learning algorithms for road segmentation.

Algorithms AP IoU
Mask R-CNN 86.5% 88.2% added into the training data. The network model after the last training is used as the initial network model for the next training. It can be seen that with the increase of training samples, Mask R-CNN segmentation is more accurate. Overall, the Mask R-CNN algorithm performs relatively well. However, there are still some problems of insufficient semantic segmentation, and the performance can be further improved due to use of more training samples [7,8,12,23,40,48].

Figure 11
Segmentation We train the models using the images from the training dataset, and adjust the models using the images from the validation dataset. After these two steps are finished, we finally test the performance of the models using the images from the test dataset. The testing results of each segmentation model are shown in Table 3, where each algorithm is measured by AP value and IoU. When IoU is greater than 0.5, the classification is considered to be correct. The input image size is 512 × 512 pixels. It can be seen that the performance of Mask R-CNN is the best, while the performance of MNC is the worst. One of the reasons is that, in Mask R-CNN, RoI Pooling is replaced by RoIAlign, leading to more accurate features extracted [20].  ) e e o e t n added into the training data. The network model after the last training is used as the initial network model for the next training. It can be seen that with the increase of training samples, Mask R-CNN segmentation is more accurate. Overall, the Mask R-CNN algorithm performs relatively well. However, there are still some problems of insufficient semantic segmentation, and the performance can be further improved due to use of more training samples [7,8,12,23,40,48].

Figure 11
Segmentation added into the training data. The network model after the last training is used as the initial network model for the next training. It can be seen that with the increase of training samples, Mask R-CNN segmentation is more accurate. Overall, the Mask R-CNN algorithm performs relatively well. However, there are still some problems of insufficient semantic segmentation, and the performance can be further improved due to use of more training samples [7,8,12,23,40,48].
(4) models using the images from the set, and adjust the models using the the validation dataset. After these two shed, we finally test the performance ls using the images from the test testing results of each segmentation own in Table 3, where each algorithm by AP value and IoU. When IoU is 0.5, the classification is considered to e input image size is 512 × 512 pixels. n that the performance of Mask Rest, while the performance of MNC is ne of the reasons is that, in Mask Roling is replaced by RoIAlign, leading added into the training data. The network model after the last training is used as the initial network model for the next training. It can be seen that with the increase of training samples, Mask R-CNN segmentation is more accurate. Overall, the Mask R-CNN algorithm performs relatively well. However, there are still some problems of insufficient semantic segmentation, and the performance can be further improved due to use of more training samples [7,8,12,23,40,48]. ) is the ion. If the prediction is perfect, t completely misses, 0 IoU = . The value, the better the algorithm reater than 0.5, the detection is e successful, which is marked as is smaller than 0.5, a false alarm is ppear, which is marked as FP . set only contains roads, mAP of AP for all classes) is same as ) .
odels using the images from the and adjust the models using the validation dataset. After these two d, we finally test the performance using the images from the test ting results of each segmentation n in Table 3, where each algorithm AP value and IoU. When IoU is the classification is considered to put image size is 512 × 512 pixels. hat the performance of Mask Rwhile the performance of MNC is of the reasons is that, in Mask Rg is replaced by RoIAlign, leading added into the training data. The network model after the last training is used as the initial network model for the next training. It can be seen that with the increase of training samples, Mask R-CNN segmentation is more accurate. Overall, the Mask R-CNN algorithm performs relatively well. However, there are still some problems of insufficient semantic segmentation, and the performance can be further improved due to use of more training samples [7,8,12,23,40,48]. ) is the area of their union. If the prediction is perfect, 1 IoU = , and if it completely misses, 0 IoU = . The higher the IoU value, the better the algorithm performance.
When IoU is greater than 0.5, the detection is considered to be successful, which is marked as TP . When IoU is smaller than 0.5, a false alarm is considered to appear, which is marked as FP . Since this dataset only contains roads, mAP (average value of AP for all classes) is same as AP itself.
We train the models using the images from the training dataset, and adjust the models using the images from the validation dataset. After these two steps are finished, we finally test the performance of the models using the images from the test dataset. The testing results of each segmentation model are shown in Table 3, where each algorithm is measured by AP value and IoU. When IoU is greater than 0.5, the classification is considered to be correct. The input image size is 512 × 512 pixels. It can be seen that the performance of Mask R-CNN is the best, while the performance of MNC is the worst. One of the reasons is that, in Mask R-CNN, RoI Pooling is replaced by RoIAlign, leading to more accurate features extracted [20].    Figure 11, Figure 12, and Figure 13 show the segmentation results of the Mask R-CNN algorithm for various testing images, respectively. Three testing images are straight road, fork road and "V" type road. Each time 500 training samples are randomly selected from the training set and added into the training data. The network model after the last training is used as the initial network model for the next training. It can be seen that with the increase of training samples, Mask R-CNN segmentation is more accurate. Overall, the Mask R-CNN algorithm performs relatively well. However, there are still some problems of insufficient semantic segmentation, and the performance can be further improved due to use of more training samples [7,8,12,23,40,48].

Conclusions
We have constructed a GF-3 SAR image dataset for road segmentation named SARroad.

Conclusions
We have constructed a GF-3 SAR image dataset fo road segmentation named SARroad. Three deep learning based segmentation algorithms of MNC

Conclusions
We have constructed a GF-3 SAR image dataset for road segmentation named SARroad. Three deep-show that the Mask R-CNN performs best. The three algorithms are trained by using the SARroad dataset, and the testing results demonstrate the validity of the SARroad dataset. The experimental results establish a performance benchmark for the SARroad dataset, which is convenient for other researchers to carry out related research on road segmentation of GF-3 SAR images. It should be noted that the current segmentation results are not accurate enough. This is because the accuracy of road labeling is not enough and there is not enough training data. In future research, we will improve the accuracy of road labeling, increase the capacity of the dataset, and enrich the shape of roads. We believe that these further improvements make the constructed SARroad dataset more efficient for road segmentation of GF-3 SAR images.

Conclusions
We have constructed a GF-3 SAR image dataset for road segmentation named SARroad. Three deep-show that the Mask R-CNN performs best. The three algorithms are trained by using the SARroad dataset, and the testing results demonstrate the validity of the SARroad dataset. The experimental results establish a performance benchmark for the SARroad dataset, which is convenient for other researchers to carry out related research on road segmentation of GF-3 SAR images. It should be noted that the current segmentation results are not accurate enough. This is because the accuracy of road labeling is not enough and there is not enough training data. In future research, we will improve the accuracy of road labeling, increase the capacity of the dataset, and enrich the shape of roads. We believe that these further improvements make the constructed SARroad dataset more efficient for road segmentation of GF-3 SAR images.

Conclusions
We have constructed a GF-3 SAR image dataset for road segmentation named SARroad. Three deeplearning based segmentation algorithms of MNC, FCIS and Mask R-CNN are used to perform experiments on the SARroad dataset. The results dataset, whic researchers to road segmenta should be segmentation r This is because is not enough training data. improve the increase the c enrich the sha these further constructed SA for road segme (e)

Figure 13
Segmentation results of Mask R-CNN for "V"

Conclusions
We have constructed a GF-3 SAR image dataset fo road segmentation named SARroad. Three deep learning based segmentation algorithms of MNC FCIS and Mask R-CNN are used to perform experiments on the SARroad dataset. The result algorithms are trained by using the SARroad dataset, and the testing results demonstrate the validity of the SARroad dataset. The experimental results establish a performance benchmark for the SARroad dataset, which is convenient for other researchers to carry out related research on road segmentation of GF-3 SAR images. It should be noted that the current segmentation results are not accurate enough. This is because the accuracy of road labeling is not enough and there is not enough training data. In future research, we will improve the accuracy of road labeling, increase the capacity of the dataset, and enrich the shape of roads. We believe that these further improvements make the constructed SARroad dataset more efficient for road segmentation of GF-3 SAR images.