Parallel Convolutional Neural Networks and Transfer Learning for Classifying Landforms in Satellite Images

The use of remote sensing has great potential for detecting many natural differences, such as disasters, climate changes, and urban changes. Due to technological advances in imaging, remote sensing has become an increasingly popular topic. One of the significant benefits of technological advancement has been the ease with which remote sensing data is now accessible. Physical and spatial information is detected by remote sensing, which can be described as the process of identifying distinctive characteristics of an environment. Resolution is one of the most important factors influencing the success of the detection processes. As a result of the resolution being below the necessary level, features of the objects to be differentiated become incomprehensible and therefore constitute a significant barrier to differentiation. The use of deep learning methods for classifying remote sensing data has become prevalent and successful in recent years. This study classified Satellite images using deep learning and machine learning methods. Based on the transfer learning strategy, a parallel convolutional neural network (CNN) was designed in the study. To improve the feature mapping of an image, convolutional branches use pre-trained knowledge of the transmitted network. Using the offline augmentation method, the raw data set was balanced to overcome its unbalanced class distribution and increased network performance. A total of 35 classes of landforms have been studied in the experiments. The accuracy value of the developed model in the classification study of landforms was 97.84%. According to experimental results, the proposed method provides high classification accuracy in detecting landforms and outperforms existing studies.


Introduction
Remote sensing is a technique introduced in the early 1960s.Compared to other images, remote sensing images are high in temporal frequency and cover wide geographical areas.As a result of the rapid developments in technology in the past years, remote sensing has become one of the most popular and important topics today.Thanks to the developing technology, the data used in the field of remote sensing has become much more accessible [17], [4].Remote sensing can basically be called the process of detecting distinctive features such as physical or spatial information of an environment by making use of images obtained with the help of tools such as satellite vehicles, image-capable aircraft, and remote image-capable ground vehicles.In this detection process, obtaining the distinctive features of the surface or object in an accurate and workable manner is very important for applications such as analysis and object detection [34,11,33].The resolution of the images obtained by the tools used in remote sensing is important.While the low resolution causes the features to become incomprehensible and indistinguishable, the situation where the resolution is above the required increases the processing load for the analyses to be made.Thanks to the increasing resolution with the developing imaging technology, the objects in the images obtained for remote sensing have become much more distinguishable and much more useful for classification [10,22].These conveniences in image access in remote sensing technology attract the attention of experts working on classification studies.Classification studies make great progress thanks to the data that can be accessed in the field of both modern applications and traditional applications in pixel-based examinations [40,32].In this area, it is possible to carry out many classification studies such as industrial structure detection, land cover classification, natural area classification, climate change detections by using remote sensing and satellite image data sets [6].
In today's technology, it has been seen that Deep Learning methods are quite successful in classifying data in the field of remote sensing.Impressive successes have been achieved in the field of successfully learning image features and obtaining suitable features for classification with deep learning methods.
One of the most common deep learning methods is Convolutional Neural Networks (CNN, Convolutional Neural Network).CNN architectures have proven their suitability for classification and have been widely used in image classification and object detection in recent years [21].When the operation of the CNN structure is examined, small, fixed size images are required to make the processing times reasonable.For this reason, operations such as size reduction in images may be necessary.In normal images, these operations can preserve important features.However, in satellite images, the situation is different from normal images because the objects and environments to be detected can be much larger than the objects and environments in ordinary images, such as an airplane on an airport runway [37].
Remote sensing and classification of satellite images are suitable for classification in many areas such as agriculture, climatic changes, disaster response and urban changes.For this reason, remote sensing is of high importance for problems and detections that require important and critical intervention.Remote sensing plays an important role in quickly detecting situations such as disasters that occur in hard-todetect areas far from transportation [24].There are many studies on satellite image classification in the literature.Some of these studies are summarized in Table 1.
The studies show that deep learning models have provided high performance in classification studies in recent years.
In summary, this study contributes the following novelty and main contributions: _ This study aims to conduct classification studies on satellite images used for remote sensing.The use of deep learning networks and features extracted using deep learning networks is intended to compare two different methods using machine learning algorithms and improve classification success despite the increasing number of classes._ An effective parallel CNN model is proposed to obtain robust and high classification performance of landforms from satellite images.The proposed method combines a designed parallel structure and a transfer learning strategy.

Dai et al. [8]
A two-layer sparse coding (TSC) model is designed to discover the real neighbors of the images and to skip the intensive learning part.K-nearest neighbor algorithms (KNN, K-nearest neighbor algorithms), Support Vector Machine (SVM) and TSC methods were applied on the data set.
For analysis, a data set was created by collecting images of 12 class labels from the Google Earth system.
It is seen that the highest success rate was obtained with the SVM method as 84.7%.It achieved a success rate of 84.2% with the TSC method.
Moorthi et al. [25] The aim observed in the study is to compare the success between SVM and Traditional classification methods. Resorucesat-

Duarte et al. [11]
A CNN structure that considers both manned and unmanned image samples was used in the study.In the analysis, benchmark, benchmark_ft, mresA, mresB, mresC networks were used.

Images collected from
Manned and UAV platforms were used.
It is seen that the highest classification success was obtained with 94.40% in the analyses made using the mresC network.Pritt et al. [27] In their studies, they discuss that detecting large geographical areas is more successful with deep learning methods compared to traditional methods.
In order to classify 63 different class labels and 57.000 images on the fMoW dataset, the CNN structure is considered.
With the CNN method used, 83% classification success was achieved, and 95% classification success was achieved on 15 of the classes.

_
The method used was first tested on a 35-class data set.To test the robustness of the model, machine learning algorithms were applied on the 10-class and 20-class datasets, which were created separately, using both deep learning networks and features extracted by deep learning._ The purpose of this application is to observe both the change in classification success according to the increase in the number of classes and the change in success rate between the pre-trained model and the proposed model.
There are five sections to this study.To contribute to previous research, the study aims to classify satellite images with CNN algorithms.The purpose of the study, its importance, and the subject of the study are discussed in the first part of the study.In the second part of the study, the material and method part, layers of convolutional neural networks, and success performance indicators of the model are discussed.The proposed CNN model used in the study is discussed in the third section, and the results are presented.In the last section, Landforms consisting of satellite images classification results are summarized and suggestions for future studies are given.

Material and Method
In the study, a classification study was carried out for landforms consisting of satellite images [18].This method has 3 main steps: offline data augmentation for network improvement and transfer learning-based network training and testing [36,31].The dataset is randomly divided into 70% training, 15% validation and 15% test data.Each class has a different number of images in the dataset.An imbalance in the data set was avoided using the offline augmentation method.Additionally, it is used to split the data set.A training set and a validation set are used as inputs to the training, while a testing set is used for testing.The transfer learning-based network architecture is used after preprocessing to determine feature maps.The pre-trained Alex-Net architecture has been enhanced with layers for transfer learning [29].This approach learns to extract high-level features using the weight parameters of the pre-trained network and has a strong predictive ability.Figure 1 illustrates the flow chart for this classification study.The general structure of the proposed classification method method has 3 main steps: offline data augmentation for network improvement and transfer learning-based network training and testing [36,31].The dataset is randomly divided into 70% training, 15% validation and 15% test data.Each class has a different number of images in the dataset.An imbalance in the data set was avoided using the offline augmentation method.Additionally, it is used to split the data set.A training set and a validation set are used as inputs to the It contains a total of 24570 images, and these images are divided into 35 classes.In the analysis, the data set was separated into 70% training, 15% validation, and 15% test data by offline augmentation.In the offline augmentation method, the data reduction process was applied to create a balanced data set.Data set visuals are as seen in Figure 2. In addition, the number of images belonging to each class is given in Table 2 [18].

Figure 1
The general structure of the proposed classification method.

Dataset Description
The RSI-CB256 dataset, which includes publicly available satellite images, was used in the study [18].It contains a total of 24570 images, and these images are divided into 35 classes.In the analysis, the data set was separated into 70% training, 15% validation, and 15% test data by offline augmentation.In the offline augmentation method, the data reduction process was applied to create a balanced data set.Data set visuals are as seen in Figure 2. In addition, the number of images belonging to each class is given in Table 2 [18].

Proposed Parallel CNN Model
As shown in Figure 3  Table 2 The distribution of classes and the number of images in the dataset [18] Classes The number of images Classes The number of images  The proposed cascade CNN model [19].

Determination of Hyperparameter Value Intervals
Determining the optimum hyperparameter values in CNN model training is determined depending on the data set, the size of the data set and the model.Accord-  ing to the literature research, some of the implications regarding educational hyperparameters are given below.
The mini-batch size is used to process all data in small batches to improve network performance and use memory.It means how much data the model will process simultaneously.The larger mini-batch size requires more memory, while the smaller mini-batch size causes more noise in error calculation [20].

Experiments
In this part of the study, the statistical validity of the proposed method was analyzed with experimental methods.Experimental studies were carried out in MATLAB® R2020b environment and on Intel (R) Core™ i7-10750H CPU @2.60 GHz, NVIDIA Quadro P620 GPU 16 GB RAM, and x64 based processor.The following section provides definitions of evaluation criteria.In addition, experimental and improvement studies were carried out in the next section.Finally, the performance of the proposed method is compared with the state-of-the-art techniques.

Evaluation Metrics
Machine learning uses a confusion matrix for performance measurement in classification studies.The confusion matrix is a table containing four different predicted and actual value combinations.An example confusion matrix is given in Table 4.
In Table 4 given classes m, a CM i,j entry in a confusion matrix represents the number of tuples in class i labeled as class j by the classifier.For a classifier to have good accuracy, ideally most of the tubles are represented along the diagonal of the confusion matrix from the CM 1,1 input to the CM m,m input, with the rest of the inputs zero or near zero.There are some metrics we can calculate with these terms [38,2,13].
Accuracy (Acc) is the percentage of samples correctly classified.Sensitivity (Sn) or Recall is a metric that shows how many transactions we need to predict and what we expect positively.Precision shows how many of the values we predicted as positive are positive.Specificity (Sp) the actual Positive Rate corresponds to the proportion of positive data points considered positive concerning all positive data points.The f score (f1) measures the accuracy of a test-the harmonic means of precision and sensitivity.These terminologies showing the relationships are given in Equations ( 1)-( 5) [15].
never get perfect weight.A learning rate that can reach and catch the superb value should be chosen without being stuck with the local minimum value.
The mini batch size can be set to 32, 64, 128, and 256 [12,5].All analyzes were determined as drop factor 0.7, drop period is 10, and the weight decay is 0.0001.
Stochastic gradient descent with momentum (SGDM) algorithm was used as the optimization algorithm.
The SGDM used in the training period accelerates the gradient vectors in the right directions and tries to find the minimum or maximum with iterations.
The epoch number is one complete model revolution from start to finish.A low number of epochs may cause the under-learning of the model, while overselecting it may cause over-learning of the model.The dropout layer is used to prevent the network from memorizing during training [16,1].The most appropriate hyperparameter ranges used in this study were determined by considering the dataset, CNN architecture and the size of the dataset, given in Table 5.

Experiments
In this part of the study, the statistical validity of the proposed method was analyzed with experimental methods.Experimental studies were carried out in MATLAB® R2020b environment and on Intel (R) Core™ i7-10750H CPU @2.60 GHz, NVIDIA Quadro P620 GPU 16 GB RAM, and x64 based processor.The following section provides definitions of evaluation criteria.In addition, experimental and improvement studies were carried out in the next section.Finally, the performance of the proposed method is compared with the state-of-the-art techniques.

Evaluation Metrics
Machine learning uses a confusion matrix for performance measurement in classification studies.
The confusion matrix is a table containing four different predicted and actual value combinations.An sensitivity.These terminologies showing the relationships are given in Equations ( 1)-( 5) [15].

Table 4
Illustration of a confusion matrix [3] Predicted Class never get perfect weight.A learning rate that can reach and catch the superb value should be chosen without being stuck with the local minimum value.The mini batch size can be set to 32, 64, 128, and 256 [12,5].All analyzes were determined as drop factor 0.7, drop period is 10, and the weight decay is 0.0001.
Stochastic gradient descent with momentum (SGDM) algorithm was used as the optimization algorithm.The SGDM used in the training period accelerates the gradient vectors in the right directions and tries to find the minimum or maximum with iterations.
The epoch number is one complete model revolution from start to finish.A low number of epochs may cause the under-learning of the model, while overselecting it may cause over-learning of the model.The dropout layer is used to prevent the network from memorizing during training [16,1].The most appropriate hyperparameter ranges used in this study were determined by considering the dataset, CNN architecture and the size of the dataset, given in Table 5.

Experiments
In this part of the study, the statistical validity of the proposed method was analyzed with experimental methods.Experimental studies were carried out in MATLAB® R2020b environment and on Intel (R) Core™ i7-10750H CPU @2.60 GHz, NVIDIA Quadro P620 GPU 16 GB RAM, and x64 based processor.The following section provides definitions of evaluation criteria.In addition, experimental and improvement studies were carried out in the next section.Finally, the performance of the proposed method is compared with the state-of-the-art techniques.

Evaluation Metrics
Machine learning uses a confusion matrix for performance measurement in classification studies.
The confusion matrix is a table containing four different predicted and actual value combinations.An example confusion matrix is given in Table 4.

Table 4
Illustration of a confusion matrix [3] Predicted Class The above metrics can analyze the proposed method's .Experiments n this part of the study, the statistical validity of the roposed method was analyzed with experimental ethods.Experimental studies were carried out in ATLAB® R2020b environment and on Intel (R) ore™ i7-10750H CPU @2.60 GHz, NVIDIA uadro P620 GPU 16 GB RAM, and x64 based rocessor.The following section provides definitions f evaluation criteria.In addition, experimental and mprovement studies were carried out in the next ection.Finally, the performance of the proposed ethod is compared with the state-of-the-art echniques.
.1.Evaluation Metrics achine learning uses a confusion matrix for erformance measurement in classification studies. he confusion matrix is a table containing four ifferent predicted and actual value combinations.An xample confusion matrix is given in Table 4.
n Table 4 given classes m, a CM i,j entry in a sensitivity.These terminologies showing the relationships are given in Equations ( 1)-( 5) [15].
(  4. Table 4 given classes m, a CM i,j entry in a nfusion matrix represents the number of tuples in sensitivity.These terminologies showing the relationships are given in Equations ( 1)-( 5) [15].

Table 4
Illustration of a confusion matrix [3] Predicted Class The .Experiments n this part of the study, the statistical validity of the roposed method was analyzed with experimental ethods.Experimental studies were carried out in ATLAB® R2020b environment and on Intel (R) ore™ i7-10750H CPU @2.60 GHz, NVIDIA uadro P620 GPU 16 GB RAM, and x64 based rocessor.The following section provides definitions f evaluation criteria.In addition, experimental and mprovement studies were carried out in the next ection.Finally, the performance of the proposed ethod is compared with the state-of-the-art echniques.
.1.Evaluation Metrics achine learning uses a confusion matrix for erformance measurement in classification studies. he confusion matrix is a table containing four ifferent predicted and actual value combinations.An xample confusion matrix is given in Table 4.
n Table 4 given classes m, a CM i,j entry in a onfusion matrix represents the number of tuples in lass i labeled as class j by the classifier.For a sensitivity.These terminologies showing the relationships are given in Equations ( 1)-( 5) [15].

Experimental Results
This study comprises 35 classes and 24570 satellite images [18].Each image is 256 by 256 pixels.The image dimensions for model inputs are resized according to the dimensions of the model inputs.It is divided into a training dataset of 70%, a validation dataset of 15%, and a test dataset of 15%.
In the classification studies with machine learning, the unbalanced class distribution in the data set negatively affects the model performance.The data set was balanced with the offline augmentation method.The class with the least number of images was determined, and many images were taken randomly from all classes.
Pipeline class with 198 images is the class with the least number of images among 35 classes during the training phase.In the analysis, 198 images were taken randomly from each class and a balanced data set consisting of a total of 6930 images was created.70% of this data set was taken as training, 15% as validation and 15% as test data.A total of 4851 images of training data are given to the network for training.The feature maps of the images were extracted, and the dataset of the network was learned.With the validation data set created, the network is prevented from memorizing.The performance of the network was evaluated with the test data that was not given to the network before.
During determining the hyperparameters, the model has been fine-tuned to enhance performance.Based on the results of the experiments, the hyperparameters listed in Table 5 were preferred.
The accuracy rate and training loss values of the proposed model and the AlexNet model during the training period are given in Figure 4.The high accuracy of A total of 4851 images of training data are given to the network for training.The feature maps of the images were extracted, and the dataset of the network was learned.With the validation data set created, the network is prevented from memorizing.The performance of the network was evaluated with the test data that was not given to the network before.
During determining the hyperparameters, the model has been fine-tuned to enhance performance.Based on the results of the experiments, the hyperparameters listed in Table 5 were preferred.A confusion matrix is a table used to describe the performance of a classification model on a set of test data for which actual values are known.Figure 5 shows confusion matrices for the proposed model on test data.In the classification estimation study, 29 test images were used for each class.When the proposed model analyzed the confusion matrix, it was found that it could not correctly predict the class of 22 images.Airplane, Air-port Runway, Artificial Grassland Hirst, Container, Dry_Farm, Forest, Mangrove, Marina, Mountain, Parking lot, Pipeline, River_Protec-tion_Forest, Sandbeach, Sapling, Sea, Shrubwood, Snow_Mountain, Sparse_Forest, Storage_Room, Stream, Town and predicted 29 images in the correct classes.Only one was incorrectly guessed among the images in the Avenue, Crossroads, Dam, Green_Farmland, Highway, and River classes.However, two of the test images belonging to the Lakeshore, Bare_Land, Bridge City_Building, Coastline, Residents, and Desert classes could not correctly predict the class of the images.While the analysis made with AlexNet architecture took 76 minutes, the proposed parallel connected CNN model analysis time was 78 minutes.
One of the expressions used when evaluating the classification performance of the model is the ROC (Receiver Operating Characteristics) curve method.The ROC curve is the graph of the ratio of the cor- Activations from different convolutions.

Figure 7
Activations from different convolutions.
ROC curve for the 35-class dataset of the proposed model Activations from different convolutions.

Figure 7
Activations from different convolutions.

Figure 7
Activations from different convolutions.
ROC curve for the 35-class dataset of the proposed model Activations from different convolutions.In the analysis that is perfect on the ROC curve, the curve passes through the points (0,0), (0,1) and (1,1).A bad ROC curve is diagonal from (0,0) to (1.1).The result of the analysis is evaluated according to these two curves.
Activations of the proposed method in different convolutions are given in Figure 7. Conv-1 and Conv-3 rep- Activations from different convolutions.
The t-SNE view of the features extracted from the convolution layers of the proposed method The t-SNE view of the features extracted from the convolution layers of the proposed method.

Performance Testing of the Proposed Model with Different Datasets
To test the performance of the model proposed in the study with different data sets, balanced data sets with chosen the same as the values given in Table 5.
Confusion matrices obtained from the analysis of the The t-SNE view of the features extracted from the convolution layers of the proposed method.

Performance Testing of the Proposed Model with Different Datasets
To test the performance of the model proposed in the study with different data sets, balanced data sets with 10 classes and 20 classes consisting of landforms were created.The data sets were balanced to make the chosen the same as the values given in Table 5.
Confusion matrices obtained from the analysis of the proposed model with 10-class and 20-class data sets are given in Figure 9 and Figure 10, respectively.As resent the first two convolutions, while Conv 6-8-10 represent parallel branches.By comparing the original input image with the activation fields, the features the proposed method has learned can easily be obtained.
White pixels indicate strong positive activations, and black pixels indicate strong negative activations.Additionally, white pixels in a channel identify the strongly activated channel at that location.When the clusters in the SoftMax layer, which is the final layer of the proposed method, are analyzed, it is seen that the distribution of the classes is quite good

Performance Testing of the Proposed Model with Different Datasets
To test the performance of the model proposed in the study with different data sets, balanced data sets with 10 classes and 20 classes consisting of landforms were created.The data sets were balanced to make the correct analysis in the analyses.70% of the images are reserved as training, 15% validation and 15% test data.In the analyses, the hyperparameters values were chosen the same as the values given in Table 5.
Confusion matrices obtained from the analysis of the proposed model with 10-class and 20-class data sets are given in Figure 9 and Figure 10, respectively.As a result of the data sets analysis, the performance metrics values were obtained as seen in Table 6.
Proposed model confusion matrix results for 10 classes Proposed model confusion matrix results for 10 classes   In future studies, optimization algorithms can be developed to improve the classification performance of the CNN model.It can be used in the classification of landforms for data sets belonging to a different number of classes.

Figure 2 A
Figure 2A sample image of each class for the satellite image data set RSI-CB256[18]

Figure 3
Figure 3The proposed cascade CNN model

Figure 3 The
Figure 3The proposed cascade CNN model.

Figure 4
Figure 4 Training progress of the AlexNet and proposed parallel CNN experiments

Figure 4 .
The high accuracy of the model indicates that the network has learned the properties of the images well.The training time varies depending on the number of epochs.

Figure 4 Figure 5
Figure 4Training progress of the AlexNet and proposed parallel CNN experiments.

Figure 5
Figure 5 Proposed model confusion matrix results for 35 classes

Figure 8
Figure 8 shows the t-distributed stochastic neighbor insertion.T-Distributed Stochastic (Random)

Figure 6
Figure 6 ROC curve for the 35-class dataset of the proposed model

Figure 7
Figure 7Activations from different convolutions

Figure 8
Figure8shows the t-distributed stochastic neighbor insertion.T-Distributed Stochastic (Random) Neighbor Embedding (t-SNE); It is an unsupervised, non-linear technique used primarily for the exploration and visualization of high-dimensional data.It calculates the probability that pairs of data points in high-dimensional space are related and then chooses a low-dimensional embedding that produces a simi-

Figure 10 Proposed
Figure 10Proposed model confusion matrix results for 20 classes

Figure 10 Proposed
Figure 10Proposed model confusion matrix results for 20 classes

Figure 9
Figure 9 Proposed model confusion matrix results for 10 classes

Table 1
Studies on satellite image classification

Table 3
Relationship between the proposed model and AlexNet architecture [12,5]is too small than the optimum value, it will take a long time to reach the ideal deal.If it is too large, the excellent value may be exceeded, and the model may never get perfect weight.A learning rate that can reach and catch the superb value should be chosen without being stuck with the local minimum value.The mini batch size can be set to 32, 64, 128, and 256[12,5].All analyzes were determined as drop factor 0.7, drop period is 10, and the weight decay is 0.0001.
[16,1]stic gradient descent with momentum (SGDM) algorithm was used as the optimization algorithm.The SGDM used in the training period accelerates the gradient vectors in the right directions and tries to find the minimum or maximum with iterations.The epoch number is one complete model revolution from start to finish.A low number of epochs may cause the under-learning of the model, while over-selecting it may cause over-learning of the model.The dropout layer is used to prevent the network from memorizing during training[16,1].The most appropriate hyperparameter ranges used in this study were determined by considering the dataset, CNN architecture and the size of the dataset, given in Table5.

Table 5
Hyperparameters for CNN models

Table 5
The accuracy rate and training loss values of the proposed model and the AlexNet model during the training period are given in

Table 6
Performance comparison of different datasets (%)As a result of the 35-class dataset analyses, the accuracy, specificity, sensitivity, precision, and f1-score values of the proposed parallel CNN model were obtained as 97.84%, 99.94%, 97.83%, 97.93%, and 97.82%, respectively.Analyses were also performed for pre-trained AlexNet using the same hyperparameters with the same data to evaluate model performance.Accuracy, specificity, sensitivity, precision, and f-score values in the AlexNet model were obtained as 96.45%, 99.90%, 96.45%, 96.57%, and 96.43%, respectively.Accuracy values for 10-class and 20-class data sets were obtained as 97.91% and 98.97%, respectively.The specifity, sensitivity, precision and f-score values for the 10-class dataset were obtained as 99.77%, 97.91%, 97.95% and 97.91%, respectively.The specifity, sensitivity, precision and f-score values for the 20-class dataset were obtained as 99.95%, 98.97%, 98.98% and 98.96%, respectively.When the 10-class and 20-class dataset performance metric results are compared with the 35-class dataset analysis performance results, it is seen that the best performance is obtained with the 20-class dataset.AlexNet, which is important for our study, and the proposed model were tested on different data sets, and the proposed model showed higher performance than AlexNet in all three different data sets.4.ConclusionsA powerful deep learning model is proposed in this study.The study categorizes 35 classes of landforms from satellite images.Due to the excess of some landform images, the offline augmentation method stabilizes the data set.In this way, the model performance has increased.The developed parallel CNN model is compared with the pre-trained AlexNet model.The AlexNet model obtained accuracy, specificity, sensitivity, precision, and f-score values as 96.45%, 99.90%, 96.45%, 96.57%, and 96.43%, respectively.It is seen that the proposed parallel CNN model improves the accuracy by 1.44%.The proposed CNN model's accuracy, specificity, sensitivity, precision, and f-score values were obtained as 97.84%, 99.94%, 97.83%, 97.93%, and 97.82%, respectively.It is seen that the proposed model improves accuracy, specificity, sensitivity, precision, and f-score by 1.43%, 0.04%, 1.43%, 1.41%, and 1.46%, respectively.