Efficient Guided Grad-CAM Tuned Patch Neural Network for Accurate Anomaly Detection in Full Images

Deep learning-based anomaly detection in images has recently gained popularity as an investigative field with many global submissions. To simplify complex data analysis, researchers in the deep learning subfield of machine learning employ Artificial Neural Networks (ANNs) with many hidden layers. Finding data occurrences that significantly differ from generalizable to most data sets is the primary goal of anomaly detection. Many medical imaging applications use convolutional neural networks (CNNs) to examine anomalies automatically. While CNN structures are reliable feature extractors, they encounter challenges when simultaneously classifying and segmenting spots that need removal from scans. We suggest a separate and integration system to solve these issues, separated into two distinct departments: classification and segmentation. Initially, many network architectures are taught independently for each abnormality, and these networks’ main components are combined. A shared component of the branched structure functions for all abnormalities. The final structure has two branches: one has distinct sub-networks, each intended to classify a particular abnormality, and the other for segmenting various abnormalities. Deep CNNs training directly on high-resolution images necessitate input layer image compression, which results in the loss of information necessary for detecting medical abnormalities. A guided Grad-CAM (GCAM) tuned patch neural network is applied to full-size images for anomaly localization. Therefore, the suggested approach merges the pre-trained deep CNNs with class activation mappings and area suggestion systems to construct abnormality sensors and then fine-tunes the CNNs on picture patches, focusing on medical abnormalities instead of training on whole images. A mammogram data set was used to test the deep patch classifier, which had a 99% overall classification accuracy. A Brain tumor image data set was used to test the integrated detector’s ability to detect abnormalities, and it did so with an average precision of 0.99.


Introduction
The detection of anomalies is a significant machine-learning challenge.Rather than assuming a static and closed system, as with most current machine learning methods, this study explores how machine learning models may manage unknown and unpredictable input in an open and changing context.Learning systems designed for anomaly detection typically assume an open environment and use that knowledge to deduce the unexpected, i.e., non-typical, out-of-the-ordinary patterns.To find unusual or new ways in data, anomaly detection methods typically first, With the average data at hand, draw out trends, describe them, and model them.Visual or image anomaly detection is used when the data being examined is an image.
Medical photos with poor contrast and excessive noise make CADe of abnormalities a problematic research topic despite its clinical importance.In particular, one of the challenges specific to medical image analysis is that abnormalities tend to manifest in tiny local regions within a high-resolution image.In mammography, for instance, benign calcification is typically quite extensive, whereas suspicious calcification is generally relatively small and requires the study of magnified images for characterization [10].It is well known that the efficiency and accuracy of traditional machine learning approaches, which rely on constructing sliding window-based detectors, need to be revised [6,15].Additionally, they employ labor-intensive and fallible manual feature engineering.New methods for pinpointing abnormalities in medical images have become available thanks to recent advancements in deep learning.At the ImageNet, deep CNNs outperformed humans on picture categorization tasks [25].However, it remains challenging due to a lack of adequate training data.Transfer learning has been implemented to solve this issue and prevent over-fitting [38].Acquired deep features from largescale labeled picture datasets, such as ImageNet, are then utilized for feature extraction on a target data set with an inadequate count of training photos.
Anomaly identification, also known as outlier detection [26] seeks to identify cases that deviate significantly from the norm that are out of the ordinary or unexpected.Recently, anomaly detection in images has become a popular research area due to its wide range of potential applications, from video surveil-lance to medicine [35] [32].Anomalies can occur for several reasons, including data errors or noises, but they can also reveal a hitherto unseen process.As a result, detecting anomalies is an essential endeavor, especially in medical image processing.Because deep neural networks have become so popular and produced remarkable results in many different contexts, many researchers have begun using them to spot anomalies in images.By looking at each pixel, it can also handle complex features like ROIs [22], [4].Deep learning-based anomaly detection has become general.It has been functional for many tasks, with technologies becoming increasingly commonplace in the medical sector [32], [29], [5], [37], [33].This is so because deep learning gets around the problem of data imbalance, which can lead to a positive-case bias.We conclude that anomaly detection is preferable to binary classification [33] because more negative medical pictures outnumber good ones from five to one.This work offers a deep learning method for detecting local abnormalities in training deep patch CNNs and merging them with CAM and RPN [24] to recognize regions in medical pictures.When deep CNNs are trained on high-resolution images, the input layer must perform image compression, which results in the loss of information necessary for medical anomaly detection.Consequently, our method fine-tunes the deep CNNs on image patches focused on medical irregularities rather than training on full-size medical images.The deep patch CNNs are educated to distinguish between the predetermined medical abnormalities.
The research questions rely on the following: 1 How does the use of guided Grad-CAM (GCAM) tuned patch neural networks impact the localization of anomalies in high-resolution medical images, and what is the trade-off between accuracy and computational efficiency in this context?
2 What are the specific challenges and limitations of training deep convolutional neural networks (CNNs) directly on high-resolution medical images, and how does image compression affect the detection of medical abnormalities?
In this study, we provide a new technique for separating and categorizing medical anomalies, which can be used to solve the issues above.The proposed split neural network uses one branch to segment images while the other uses classification.The availability of similar low-level properties across several anomalies allows the proposed network's central node to be created to be identical in both splits so that they may share computing functions.Following are some ways in which the proposed method contributes: The remaining sections of this paper are structured as follows.Multiple categorization schemes for anomalies converse in Section 2. Section 3 describes the GCAM-equipped network proposed for spotting numerous anomalies.The methods and outcomes of the research are discussed in Section 4. Section 5 contains the conversation, followed by the closing thoughts in Section 6.

Related Works
In computer vision and image analysis, abnormality detection using deep learning is a hot topic.Multiple studies have proposed deep learning-based methods for anomaly detection across a wide range of fields and industries, from medical imaging and industrial inspection to surveillance and beyond.
In the past few years, CNNs have made great strides toward automating the derivation of generic characteristics from WCE pictures.Jia et al. [12] are one of the earliest publications on CNN-based WCE image classification.They implemented a 5-layer convolutional neural network CNN to extract features from WCE images and detect bleeding.Since then, various CNN architectures for detecting anomalies have been proposed [7].To learn the features of WCE images, Segui et al. [30] used three parallel CNNs and the Hessian and Laplacian of the images in addition to the original RGB inputs.Finally, features from all three CNNs are combined before being sent to fully connected layers for image classification.Iakovidis et al. [11] have also used a 5-layer CNN to detect and pinpoint anomalies.By applying this algorithm to the feature maps, we can locate exciting features that may be found in out-of-the-ordinary areas.Goel et al. [8] used a CNN with two forks, one serving as the network's backbone and the other performing a dilated Conv operation that preserves image resolution.The primary CNN uses multiple scales to extract features.The expanded Conv branch broadens the system's sensitivity and facilitates the separation of key features.To get the dominant global features, [2] we finally combine the elements from the backbone CNN and the dilated CNN.
Performance evaluation measures were also calculated using three distinct pattern classifiers: Support Vector Machine [28], [13], [34], Linear Discriminate Analysis, and Bayes Linear Classifier.The other work contributed is dimensionally reducing the obtained features with Principal Component Analysis and then classifying them with the Support Vector Machine technique.The abnormal breast area can be pinpointed with the help of a mammogram.The abnormality in mammograms was detected using a variety of pattern recognition algorithms and decision-making systems.Textural and geometric features are used for feature extraction.A correlation-based Feature Selection (CFS) scheme selects features for further classification into masses or non-masses.Breast disorders such as tumors and microcalcification may be detected using methods for image processing such as Shape derivation and Boundary of group or lesion.
One indicator of breast abnormality detection is the presence of a mass at the periphery of the mass.Breast masses that are round and well-defined on a mammogram are typically benign, while those with an uncertain border indicate cancer.The Digital database for screening mammography (DDSM) is the source for these images, and Verma et al. [3] have developed a computer-aided diagnosis scheme to establish an outline for computerized mammograms that prioritizes a neural hereditary calculation with an acquired precision of 85%.According to Ahuja et al. [23], benign neoplasms are well-marginated and have regular shapes, while malignant neoplasms have indistinct and irregular borders that become more speculative over time.Masses and benign glandular tissue slightly differ in X-ray attenuation, so a lack of contrast and blurriness characterize their visual presentation.Micro-calcifications, or mammographic "bright spots," are tiny calcium deposits.
Reconstruction-based or image generation and completion methods are another option for training on one-class data.The benefit of these methods-often based on neural networks-is that they can detect anomalies pixel-by-pixel.After an input has been processed through a bottleneck layer, for instance, autoencoders can reconstruct it [36], [14].To avoid having to learn a meaningful representation, they compress the images.Denoising variants solves this issue by introducing noise into the input image, forcing the DAN to reconstruct the original query [19].However, because such manipulation is typically performed at the pixel level, the DAN does not need to learn much semantic data.A recent image creation method relies on a GAN taught to produce average results.
Yuan et al. [39] have implemented a resemblance restraint in the loss function when training a DenseNet to handle intra-class variations.For low intra-class variation and high inter-class variations, Yuan et al. [40] proposed a densely connected CNN with unbalanced discriminant loss and category-sensitive loss.
To reduce intra-class variance in polyp detection, Guo et al. [9] developed a triple ANET construction using angular contrastive loss and an attention mechanism.
Many existing studies concentrate on intra-class variations, a diverse range of anomalies challenging in achieving accuracy and reducing error rate.The popular systems described in the literature focus on intra-class variation caused by a single abnormality type.Unique mechanisms need to be designed to capture the underlying characteristics of various anomalies to deal with and classify them as abnormal.This research presents a parallel CNN framework that utilizes a unique meta-feature retrieval approach to differentiate among multiple anomalies using the underlying statistical trends in the feature maps.Combined with traditional feature maps, these meta-features boosted the accuracy of the classifier.Table 1 shows comparison of conventional model woth its unique characteristics.

Materials and Methods
Deep CNN pre-trained on normal pictures is finetuned using extracted image patches centered on medical aberrations.This allows the CNNs to acquire image attributes for local abnormalities.

Data Collection
The DDSM is a free resource for researchers interested in using digital mammograms to study breast cancer.
Computer  Once the sensor has been trained, it will report irregularities as rectangles with likelihoods attached to them.Both methods are proven effective and improve performance when finding local abnormalities in mammograms and ultrasound liver images.Figure 1 depicts the proposed method.

Level 1 Classifier
Each anomaly can have its network trained explicitly on its dataset of abnormal images.Each trained network successfully segments the training-set abnormality.As a result, the architecture of each network can be finetuned to address a specific abnormality.There is no coordination between the different constituents of the networks and no use of shared characteristics of the abnormalities, all contributing to the overall complexity of the network.Although these systems are appropriately particular, they need more job duplication.Not only are they useless for classification, but they are only suitable for segmentation.When segmenting and classifying multiple abnormalities, it is essential to focus on individual monsters as required for segmentation to maintain the broader understanding needed for classification.

Figure 1
Outline of the Modeling Proposal .

Figure 2
The level 1 classifier architecture -GAN structure

Figure 2
The level 1 classifier architecture -GAN structure

Level 1 Classifier
Each anomaly can have its network trained explicitly on its dataset of abnormal images.Each trained network successfully segments the training-set abnormality.As a result, the architecture of each network can be fine-tuned to address a specific abnormality.
There is no coordination between the different constituents of the networks and no use of shared characteristics of the abnormalities, all contributing to the overall complexity of the network.Although these systems are appropriately particular, they need more job duplication.Not only are they useless for classification, but they are only suitable for segmentation.When segmenting and classifying multiple abnormalities, it is essential to focus on individual monsters as required for segmentation to maintain the broader understanding needed for classification.
where  �� is the decoder's settings variable.The goal of AE is to use the latent-space representation ular inputs with low anomaly scores and inconsistent reconstructions of aberrant pictures with high anomaly scores.The suggested model's design is depicted in Figure 2. It involves incorporating semi-supervised learning via a memory-augmented deep auto-encoder.
To learn representations in which the input and the output are identical, an AE makes use of feedforward neural networks.The sole components of a standard AE are the encoding and decoding units.The encoder is symbolized by the function f en ⊙, where A and B are the domains of the original data and the latent manifold, respectively.This function maps the original image a onto a latent vector b of lower dimensionality (  )   b a d d < by use of a predetermined mapping.
Common abnormalities in medical images have been linked to severe illnesses.Segmentation and abnormal region classification are both necessary for accurate diagnosis of significant diseases.Segmentation and categorization of aberrant areas are essential for a precise assessment.
The primary objective of the first classification stage is to identify aberrant pictures.The normal samples are used to train the model, which then attempts to produce consistent reconstructions of regular inputs with low anomaly scores and inconsistent reconstructions of aberrant pictures with high anomaly scores.The suggested model's design is depicted in Figure 2. It involves incorporating semi-supervised learning via a memory-augmented deep auto-encoder.To learn representations in which the input and the output are identical, an AE makes use of feedforward neural networks.The sole components of a standard AE are the encoding and decoding units.The encoder is symbolized by the function ( , ) where e is the collection of encoder parameters.For producing the recreated picture from  , the decoder is specified by the functio back to its innovative sp ˆ( , ) ( ( where  �� is the decod goal of AE is to use the  to successfully comp then use the encoded ver  � that is similar to  .
where  is the th memo and  is a weight ve where e is the collection of encoder parameters.For producing the recreated picture from b, the decoder is specified by the function f de ⊙, which maps b back to its innovative space.( , ) where e is the collection of encoder parameters.For producing the recreated picture from  , the decoder is specified by the function de f  , which maps  back to its innovative space.
ˆ( , ) ( ( , ), ) where  �� is the decoder's settings variable.The goal of AE is to use the latent-space representation  to successfully compress the input data  and then use the encoded version to reconstruct a image  � that is similar to  .
where  = .A storage record containing a matrix is the basis of the system that stores information of , where M  characterizes the number of prototypical routes.The memory component, given a latent variable b, will produce a recovered latent representation in accordance with , where  is the th memory element in the database and  is a weight vector where 0 n w  and , where θ de is the decoder's settings variable.ˆ( , ) ( ( , ), ) where  �� is the decoder's settings variable.The goal of AE is to use the latent-space representation  to successfully compress the input data  and then use the encoded version to reconstruct a image  � that is similar to  .
where  = .A storage record containing a matrix is the basis of the system that stores information of , where M  characterizes the number of prototypical routes.The memory component, given a latent variable b, will produce a recovered latent representation in accordance with , where  is the th memory element in the database and  is a weight vector where 0 n w  and , where l = a.A storage record containing a matrix is the basis of the system that stores information of , where where  �� is the decoder's settings variable.The goal of AE is to use the latent-space representation  to successfully compress the input data  and then use the encoded version to reconstruct a image  � that is similar to  .
where  = .A storage record containing a matrix is the basis of the system that stores information of , where M  characterizes the number of prototypical routes.The memory component, given a latent variable b, will produce a recovered latent representation in accordance with , where  is the th memory element in the database and  is a weight vector where where n is the nth memory element in the database and ω is a weight vector where 0 n w > and to describe the importance of each prototype vector Ω in generating b.The memory components and the latent query vector b are used to inform the construction of the weight vector w.There are two parts to the GAN: the generator G and the discriminator D. D seeks to categorize the data as real or artificial, whereas G is meant to develop synthetic samples that are comparable to the real data.As G and D compete, GAN is able to optimize the objective function and so understand the fundamental information distribution, allowing for the generation of high-quality artificial samples.The subsequent anomaly score is used during the inference phase to determine the level of abnormality present in a fresh query signal : This measures how far off  's pattern extraction from normal photos is from the actual image.When processing anomalous images, the system is expected to yield high abnormality scores.However, if () is small, it means that the test picture  has patterns that are typical of normal images, and so will be classed as normal by the suggested technique

Segmentation
The U-Net was designed as a CNN for use in the medical imaging field.The network has an encoder and a decoder with skip links to keep spatial data intact.Class imbalance, limited sample numbers, and complicated anatomical systems are only some of the difficulties this architecture is built to overcome.For segmenting medical images, Figure 3 exposes the technical details of the U-Net architecture.
To extract information from an input picture, the encoder network passes it over a chain of convolutional layers.The construction is built from stacked units of two 3×3 convolutional layers, a rectified linear unit (ReLU) activation function, and a down sampling layer of 2×2 max pooling.Each pooling operation doubles the number of filters used in each convolutional layer.Using batch normalization after each convolutional layer improves the resolution, and overfitting is mitigated.The encoder network's output is sent into the decoder network, where it is upsampled to the input size and combined with the encoder network's feature maps through skip links.There are two 3×3 convolutional layers with ReLU activations, and one 2×2 transposed convolutional layer for Upsampling, all of which are repeated in blocks throughout the design.After every block, the number of filters in the transposed convolutional layer is halved.

Figure 3
The UNet architecture for the image segmentation . (5) There are two parts to the GAN: the generator G and the discriminator D. D seeks to categorize the data as real or artificial, whereas G is meant to develop synthetic samples that are comparable to the real data.As G and D compete, GAN is able to optimize the objective function and so understand the fundamental information distribution, allowing for the generation of high-quality artificial samples.
The subsequent anomaly score is used during the inference phase to determine the level of abnormality present in a fresh query signal a: to describe the importance of each prototype vector  in generating .The memory components and the latent query vector  are used to inform the construction of the weight vector .There are two parts to the GAN: the generator G and the discriminator D. D seeks to categorize the data as real or artificial, whereas G is meant to develop synthetic samples that are comparable to the real data.As G and D compete, GAN is able to optimize the objective function and so understand the fundamental information distribution, allowing for the generation of high-quality artificial samples.The subsequent anomaly score is used during the inference phase to determine the level of abnormality present in a fresh query signal : This measures how far off  's pattern extraction from normal photos is from the actual image.When processing anomalous images, the system is expected to yield high abnormality scores.However, if () is small, it means that the test picture  has patterns that are typical of normal images, and so will be classed as normal by the suggested technique The U-Net was designed as a C medical imaging field.The netw and a decoder with skip links t intact.Class imbalance, limite and complicated anatomical sys of the difficulties this archit overcome.For segmenting med 3 exposes the technical deta architecture.To extract information from an encoder network passes it convolutional layers.The constr stacked units of two 3×3 conv rectified linear unit (ReLU) activ a down sampling layer of 2×2 pooling operation doubles the used in each convolutional l normalization after each co improves the resolution, a mitigated.The encoder network's outpu decoder network, where it is ups size and combined with the feature maps through skip links convolutional layers with ReL one 2×2 transposed convol Upsampling, all of which are throughout the design.After number of filters in the transp layer is halved.

Figure 3
The UNet architecture for the image segmentation (6) This measures how far off G's pattern extraction from normal photos is from the actual image.When processing anomalous images, the system is expected to yield high abnormality scores.However, if s(a) is small, it means that the test picture a has patterns that are typical of normal images, and so will be classed as normal by the suggested technique

Segmentation
The U-Net was designed as a CNN for use in the medical imaging field.The network has an encoder and a decoder with skip links to keep spatial data intact.Class imbalance, limited sample numbers, and complicated anatomical systems are only some of the difficulties this architecture is built to overcome.For segmenting medical images, Figure 3 exposes the technical details of the U-Net architecture.There are two parts to the GAN: the generator G and the discriminator D. D seeks to categorize the data as real or artificial, whereas G is meant to develop synthetic samples that are comparable to the real data.As G and D compete, GAN is able to optimize the objective function and so understand the fundamental information distribution, allowing for the generation of high-quality artificial samples.The subsequent anomaly score is used during the inference phase to determine the level of abnormality present in a fresh query signal : This measures how far off  's pattern extraction from normal photos is from the actual image.When processing anomalous images, the system is expected to yield high abnormality scores.However, if () is small, it means that the test picture  has patterns that are typical of normal images, and so will be classed as normal by the suggested technique

Segmentation
The U-Net was designed as a CNN for use in the medical imaging field.The network has an encoder and a decoder with skip links to keep spatial data intact.Class imbalance, limited sample numbers, and complicated anatomical systems are only some of the difficulties this architecture is built to overcome.For segmenting medical images, Figure 3 exposes the technical details of the U-Net architecture.
To extract information from an input picture, the encoder network passes it over a chain of convolutional layers.The construction is built from stacked units of two 3×3 convolutional layers, a rectified linear unit (ReLU) activation function, and a down sampling layer of 2×2 max pooling.Each pooling operation doubles the number of filters used in each convolutional layer.Using batch normalization after each convolutional layer improves the resolution, and overfitting is mitigated.The encoder network's output is sent into the decoder network, where it is upsampled to the input size and combined with the encoder network's feature maps through skip links.There are two 3×3 convolutional layers with ReLU activations, and one 2×2 transposed convolutional layer for Upsampling, all of which are repeated in blocks throughout the design.After every block, the number of filters in the transposed convolutional layer is halved.Step 2: Initialize a U-Net model with the following architecture Step 3: Encoder operation i. input the image with one channel.
ii. Apply a sequence of convolutional blocks, each consisting of: _ Two 3x3x3 convolutions with ReLU activation._ A 2x2x2 max pooling operation to reduce spatial dimensions._ Doubling the number of filters starting with 32 and progressing through 64, 128, 256.
iii.Performing batch normalization after each convolutional layer to stabilize learning and improve convergence.
Step 4: Bottleneck In the bottom of the U-Net without pooling image patch is executed using two 3x3x3 convolutions followed by ReLU activation and batch normalization.
Step 5: Decoder operation i. Up-sample the image using 2x2x2 transposed convolutions to increase the spatial dimensions.
ii. Concatenate the upsampled output with the corresponding encoder feature maps (skip connections).
iii. two 3x3x3 convolutions with ReLU activation is used for image segementing.
iv. Halve the number of filters after each up-sampling step (starting from 512 and reducing through 256, 128, 64).
Step 6: Output Layer: 1x1x1 convolution performed to map the feature maps to the number of segmentation labels.

Algorithm 1: Segmenting using U-Ne
Step 7: Execute loss function using softmax cross-entropy for multi-class voxel classification and Dice loss for segmentation accuracy.optimize the features using Adam with an initial learning rate and evaluate overall measures Step 8: Model Training: a Train the U-Net model using the preprocessed and augmented dataset I_d.b validates the data still stopping and best model checkpointing.c learning rate is adjusted and performance is evaluated Step 9: Test set is executed for the evaluating the performance.Use the performance metrics defined earlier for quantitative assessment.
To extract information from an input picture, the encoder network passes it over a chain of convolutional layers.The construction is built from stacked units of two 3×3 convolutional layers, a rectified linear unit (ReLU) activation function, and a down sampling layer of 2×2 max pooling.Each pooling operation doubles the number of filters used in each convolutional layer.Using batch normalization after each convolutional layer improves the resolution, and overfitting is mitigated.
The encoder network's output is sent into the decoder network, where it is upsampled to the input size and combined with the encoder network's feature maps through skip links.There are two 3×3 convolutional layers with ReLU activations, and one 2×2 transposed convolutional layer for Upsampling, all of which are repeated in blocks throughout the design.After every block, the number of filters in the transposed convolutional layer is halved.
Slices taken at various angles comprise the three-dimensional pictures used in biomedical imaging.The study of biomedical images requires the processing of massive volumes of information.Segmentation-labeled data may be challenging to annotate since computers can only show data in two-dimensional slices.Thus, typical 2D image models often need more performance and context loss while processing 3D images.The solution is a 3D U-Net based on the existing U-Net framework but with a reducing encoder component for full-image analysis and a continuously expanding decoder section for high-resolution segmentation generation.There are numerous structural similarities between 2D and 3D U-Nets; however, 3D U-Nets use 3D convolution, 3D pooling, and 3D upsampling in place of all 2D processes.To avoid slowdowns in the network, batch normalization (BN) is implemented.
There are four parsing steps per layer in the encoding path, the same as the regular U-Net, and the same number in the decoding road.The layer structure consists of two 3 × 3 × 3 convolutions, a 2-step-sized corrected linear unit (ReLu), and a 2 × 2 × 2 maximum pooling layer.Each ReLu active layer in the synthesis route is sandwiched between two 3×3×3 convolutions and two 2×2×2 higher convolutions with two steps in each dimension.The skip links in the equal-resolution attribute map give high-resolution information for decoding.The number of output channels is reduced to the size of the labels, which in this case is 3, thanks to the 1×1×1 convolution in the last layer.The total number of parameters for the structure is 19069955.
Before training, the data and ground truth labelers are subjected to smooth, dense deformation fields in addition to the rotation, scaling, and Gray value increase.Therefore, B-spline interpolation is applied to random vectors drawn from a uniform spreading with a standard deviation of 4 and a grid spacing of 32 voxels in each direction.We evaluate what the network produces to the ground truth label using the softmax plus weighted cross-entropy loss to strike a better balance among the effect of microscopic blood vessels and backdrop voxels on the loss.This comprehensive learning approach may employ fully or partially automated techniques to segment 3D targets from limited explanations.This network can generalize well from a limited amount of labeled data thanks to its well-designed structure and data improvement features.Reasonable pictures could be produced, its preprocessing approach could be rationalized, and the system construction could be expanded to accommodate any size 3D data collection if suitable rigid alteration and modest elastic distortion submissions were used.

Level 2 Classifier
Numerous medical procedures, such as endoscopy, dermoscopy, funduscopy, etc Breast cancer is prevalent among women, and early detection is crucial for improving patient outcomes.Digital mammography is an imaging technique that captures X-ray images of the breast, which computer algorithms can analyze to assist radiologists in identifying potential cancerous regions [18], [41].Meta-heuristic optimization algorithms are techniques inspired by natural processes such as evolution, swarm intelligence, and simulated annealing.These algorithms search for optimal solutions in complex, high-dimensional problem spaces [16].Figure 4    The Guided Grad-CAM tuned patch neural network is applied to full-size mammograms for anomaly localization.Traditional methods employ a sliding window for the classifier to scan the entire image, which is time-consuming and inefficient.In contrast, our process allows anomaly localization with a single forward computing run.When the full-size mammography is fed into the patch classifier, and CAM is computed, a heat map of anomalies is generated.Guided Grad-CAM (Gradient-weighted Class Activation Mapping) is a method used in deep learning to visualize the sections of an input image that contribute the most to the prediction made by CNN.It combines the concepts of Grad-CAM and guided backpropagation to generate more accurate and detailed visualizations.The equation for Guided Grad-CAM (GCAM) can be defined as follows: The gradient of the desired class score relative to the output feature maps from the convolutional layer is computed.
Here,  � represents the target class score,  �� � represents the activation at position (, )in the -th feature map of the final convolutional layer, and  is the spatial size of the feature maps.Calculate the weights ( � � ) by taking the global average pooling of the gradients obtained in the previous step:

Re ( )
The ReLU activation is applied to discard any negative gradients.Calculate the Grad-CAM activation map by taking the weighted sum of the feature maps using the obtained weights: Re ( ) � � represents the Grad-CAM activation map for the target class  , and  � characterizes the  -th feature map of the final convolutional layer.Calculate the guided backpropagation gradient for each pixel: Here,  represents the input image.Calculate the guided Grad-CAM by element-wise multiplying the Grad-CAM activation map with the guided backpropagation gradient The symbol ⊙ denotes element-wise multiplication.The resulting Guided Grad-CAM  � � represents the guided Grad-CAM activation map for the target class , , which identifies the input picture regions to be highlighted that contribute the most to the prediction made by the CNN.

Experimental Setup
Linux was used for the experiments, and they were run using the Keras framework.Table 2

displays the experiment-specific values for our model's
In step 1, we use picture patches from calcification and bulk instances to train a two-class classifier utilizing transfer learning and cutting-edge deep CNN architectures.The data flow is based on AlexNet's architecture for illustration purposes.VGGnet, Goo-gLeNet, and ResNet are the other models used for this purpose.After the last convolution layer in the deep patch, CNN is removed, a global average pooling layer and a fully linked layer are added.The new model requires retraining to acquire the output layer CAM weights w k (i = 1, 2, ...n).After feeding aomplete mammography into the input layer, the ResNet is the only one of the four deep CNN designs that is ready to compute CAM.The output of the last convolutional layer is the feature maps, or Tensors, indicated by F k (i = 1, 2, ...n).By reflecting Convolutional maps of features and their weights from the output layer, we can determine which parts of the picture are most significant.The Guided Grad-CAM tuned patch neural network is applied to full-size mammograms for anomaly localization.Traditional methods employ a sliding window for the classifier to scan the entire image, which is time-consuming and inefficient.In contrast, our process allows anomaly localization with a single forward computing run.When the full-size mammography is fed into the patch classifier, and CAM is computed, a heat map of anomalies is generated.Guided Grad-CAM (Gradient-weighted Class Activation Mapping) is a method used in deep learning to visualize the sections of an input image that contribute the most to the prediction made by CNN.It combines the concepts of Grad-CAM and guided backpropagation to generate more accurate and detailed visualizations.The equation for Guided Grad-CAM (GCAM) can be defined as follows: The gradient of the desired class score relative to the output feature maps from the convolutional layer is computed.
The ReLU activation is applied to discard any negative gradients.Calculate the Grad-CAM activation map by taking the weighted sum of the feature maps using the obtained weights:

Re (
) � � represents the Grad-CAM activation map for the target class  , and  � characterizes the  -th feature map of the final convolutional layer.Calculate the guided backpropagation gradient for each pixel: Here,  represents the input image.Calculate the guided Grad-CAM by element-wise multiplying the Grad-CAM activation map with the guided backpropagation gradient The Guided Grad-CAM tuned patch neural network is applied to full-size mammograms for anomaly localization.Traditional methods employ a sliding window for the classifier to scan the entire image, which is time-consuming and inefficient.In contrast, our process allows anomaly localization with a single forward computing run.When the full-size mammography is fed into the patch classifier, and CAM is computed, a heat map of anomalies is generated.
Guided Grad-CAM (Gradient-weighted Class Activation Mapping) is a method used in deep learning to visualize the sections of an input image that contribute the most to the prediction made by CNN.It combines the concepts of Grad-CAM and guided backpropagation to generate more accurate and detailed visualizations.
The equation for Guided Grad-CAM (GCAM) can be defined as follows: The gradient of the desired class score relative to the output feature maps from the convolutional layer is computed.
The Guided Grad-CAM tuned patch neural network is applied to full-size mammograms for anomaly localization.Traditional methods employ a sliding window for the classifier to scan the entire image, which is time-consuming and inefficient.In contrast, our process allows anomaly localization with a single forward computing run.When the full-size mammography is fed into the patch classifier, and CAM is computed, a heat map of anomalies is generated.Guided Grad-CAM (Gradient-weighted Class Activation Mapping) is a method used in deep learning to visualize the sections of an input image that contribute the most to the prediction made by CNN.It combines the concepts of Grad-CAM and guided backpropagation to generate more accurate and detailed visualizations.The equation for Guided Grad-CAM (GCAM) can be defined as follows: The gradient of the desired class score relative to the output feature maps from the convolutional layer is computed.
Here,  � represents the target class score,  �� � represents the activation at position (, )in the -th feature map of the final convolutional layer, and  is the spatial size of the feature maps.Calculate the weights ( � � ) by taking the global average pooling of the gradients obtained in the previous step: Re ( ) The

Experim
Linux was used run using the K the experimen (8) Here, y c represents the target class score, A k mn represents the activation at position (m, n) in the k-th feature map of the final convolutional layer, and Z is the spatial size of the feature maps.

Calculate the weights (a c k
) by taking the global average pooling of the gradients obtained in the previous step: The Guided Grad-CAM tuned patch neural network is applied to full-size mammograms for anomaly localization.Traditional methods employ a sliding window for the classifier to scan the entire image, which is time-consuming and inefficient.In contrast, our process allows anomaly localization with a single forward computing run.When the full-size mammography is fed into the patch classifier, and CAM is computed, a heat map of anomalies is generated.Guided Grad-CAM (Gradient-weighted Class Activation Mapping) is a method used in deep learning to visualize the sections of an input image that contribute the most to the prediction made by CNN.It combines the concepts of Grad-CAM and guided backpropagation to generate more accurate and detailed visualizations.The equation for Guided Grad-CAM (GCAM) can be defined as follows: The gradient of the desired class score relative to the output feature maps from the convolutional layer is computed.
Here,  � represents the target class score,  �� � represents the activation at position (, )in the -th feature map of the final convolutional layer, and  is the spatial size of the feature maps.Calculate the weights ( � � ) by taking the global average pooling of the gradients obtained in the previous step: Re ( ) The

Expe
Linux wa run using the exper The ReLU activation is applied to discard any negative gradients.Calculate the Grad-CAM activation map by taking the weighted sum of the feature maps using the obtained weights: The Guided Grad-CAM tuned patch neural network is applied to full-size mammograms for anomaly localization.Traditional methods employ a sliding window for the classifier to scan the entire image, which is time-consuming and inefficient.In contrast, our process allows anomaly localization with a single forward computing run.When the full-size mammography is fed into the patch classifier, and CAM is computed, a heat map of anomalies is generated.Guided Grad-CAM (Gradient-weighted Class Activation Mapping) is a method used in deep learning to visualize the sections of an input image that contribute the most to The ReLU activation is applied to discard any negative gradients.Calculate the Grad-CAM activation map by taking the weighted sum of the feature maps using the obtained weights:

Re ( )
The ReLU activation is applied to discard any negative gradients.Calculate the Grad-CAM activation map by taking the weighted sum of the feature maps using the obtained weights: Re ( ) � � represents the Grad-CAM activation map for the target class  , and  � characterizes the  -th feature map of the final convolutional layer.Calculate the guided backpropagation gradient for each pixel: Here,  represents the input image.Calculate the guided Grad-CAM by element-wise multiplying the Grad-CAM activation map with the guided backpropagation gradient The symbol ⊙ denotes element-wise multiplication.The resulting Guided Grad-CAM  � � represents the guided Grad-CAM activation map for the target class , , which identifies the input picture regions to be highlighted that contribute the most to the prediction made by the CNN.

Experimental Setup
Linux was used for the experiments, and they were run using the Keras framework.Table 2 displays the experiment-specific values for our model's (11) Here, X represents the input image.Calculate the guided Grad-CAM by element-wise multiplying the Grad-CAM activation map with the guided backpropagation gradient (7)  ) The ReLU activation is applied to discard any negative gradients.Calculate the Grad-CAM activation map by taking the weighted sum of the feature maps using the obtained weights:

Re (
) � � represents the Grad-CAM activation map for the target class  , and  � characterizes the  -th feature map of the final convolutional layer.Calculate the guided backpropagation gradient for each pixel: Here,  represents the input image.Calculate the guided Grad-CAM by element-wise multiplying the Grad-CAM activation map with the guided backpropagation gradient The symbol ⊙ denotes element-wise multiplication.The resulting Guided Grad-CAM  � � represents the guided Grad-CAM activation map for the target class , , which identifies the input picture regions to be highlighted that contribute the most to the prediction made by the CNN.

Experimental Setup
Linux was used for the experiments, and they were run using the Keras framework.Table 2 displays the experiment-specific values for our model's negative gradients.Calculate the Grad-CAM activation map by taking the weighted sum of the feature maps using the obtained weights:

Re (
) � � represents the Grad-CAM activation map for the target class  , and  � characterizes the  -th feature map of the final convolutional layer.Calculate the guided backpropagation gradient for each pixel: Here,  represents the input image.Calculate the guided Grad-CAM by element-wise multiplying the Grad-CAM activation map with the guided backpropagation gradient The symbol ⊙ denotes element-wise multiplication.The resulting Guided Grad-CAM  � � represents the guided Grad-CAM activation map for the target class , , which identifies the input picture regions to be highlighted that contribute the most to the prediction made by the CNN.

Experimental Setup
Linux was used for the experiments, and they were run using the Keras framework.Table 2 displays the experiment-specific values for our model's (12) The symbol ⊙ denotes element-wise multiplication.The resulting Guided Grad-CAM L G C represents the guided Grad-CAM activation map for the target class c, which identifies the input picture regions to be highlighted that contribute the most to the prediction made by the CNN.

Experimental Setup
Linux was used for the experiments, and they were run using the Keras framework.Table 2 displays the experiment-specific values for our model's hyper-parameters.The validation data set was used to determine these hyper-parameters.The validation dataset also served to establish the normal/abnormal cut-off.applied to the proposed image abnormality detection method, the effectiveness metrics of accuracy, sensitivity, specificity, Receiver Operating Characteristic (ROC), and Area Under the Curve (AUC) validate the excellence of the findings and outcome from the actual case.
The input picture was scaled down to 64x64 pixels so that the model could be systematically evaluated.In the first round of testing, we use a 64x64 grid with a 0.1, 0.01, 0.001, and 0.0001 epoch learning rate.After the database was split into testing and training sets, the generated algorithm was put through its paces regarding performance metrics.Performance metrics of 99.12% accuracy, 98.21% sensitivity, and 98.36% specificity on the CBSI-DDSM dataset and 99.35% accuracy, 98.10% sensitivity, and 98.87% specificity were obtained with a learning rate of 0.0001 as in Table 3 and Figure 5. Table 4 shows the standard performance measure of precison, recall and F1-score.
We construct class activation mapping using the finetuned ResNet to pinpoint anomalies.The decision to utilize ResNet to compute GCAM was based on the fact that it does not require any further training.Without sacrificing generalizability, we load a single calcification-class complete mammography into ResNet and calculate the GCAM.It is challenging to train classifiers on calcification and mass instances because of mammography pictures' poor contrast and noise.Input pictures to deep neural networks can

Result and Discussion
This section shows how well the suggested technique determines whether a picture is aberrant (DR).The obtained performance metrics are then evaluated against competing, cutting-edge strategies.When   After the outliers have been filtered out, the dataset of images is identified as normal or abnormal using a basic neural network design.
In Figure 8, the projected result is presented as a confusion matrix.Of the total input images, 99% were correctly labeled as expected, and 100% were labeled as abnormal.The suggested approach addresses limits in the same way that any expert system would.The training data plays a crucial role in the model's recital.Because of this, the algorithm flagged the intake of new informationa typical, healthy cardiac image-as suspicious.This may be an issue when the corpus lacks domain-specific examples.The temporal components of the picture are stripped away during the cleaning process since the suggested model only considers spatial features.As a result, features are occasionally misrepresented because of a failure to account for edges and differences in the local binary pattern of the pictures.Interestingly, only a small amount of training can get the minimal Loss score down to 0.01.This theory helped reduce the input data length and illustrate the latent space model.Most activations also show a sparse hierarchical architecture, mainly when focusing on the spatial aspects.In addition, the encoder reduces the dimensionality of the complex representation before reconstructing the original picture during decoding.The model may get overfit during this adjustment by memorizing the training data.The suggested approach typically uses half the input variables to regulate how many neurons may be used in the core layer.By doing so, we can guarantee that the model is picking up on the most relevant and valuable aspects of the data we provide it.It is important to emphasize that the model is unsupervised because it makes no use of training labels.The system is data-dependent since the buried layers' neurons are all driven by the available data.Therefore, the activation function causes various neurons to fire, resulting in a varied network output when the input characteristics change.Our technique had a tremendous overall classification performance but needed to be more accurate in a small percentage of data.only be 224x224 or 227x227 pixels in size.Fine features essential for categorization will likely be lost if mammography pictures are resized to these sizes.To distinguish between calcification and bulk instances, we suggest training classifiers on cropped batches of images and then applying the resulting deep CNN models to full-size mammograms.We effectively apply the patch classifier to the task of pinpointing anomalies in complete mammography images using a method termed GCAM.The total accuracy indicates how well the training data was classified.However, val_acc is essential for testing the correctness of the hidden data in the test data.As shown in Figure 6, a neural network model performs well when the val_loss and val_acc go up.However, the model has learned for a non-deterministic batch, as shown by the training epoch.The word batch is employed since the validation loss is determined per sample.val_loss is an excellent indicator of the model's performance on unseen data.Overfitting is not an issue if the val_loss is small.Thus, if the model has been taught significantly on the data, an increase in val_loss indicates overfitting.Line charts for the defined central terms depict an overall assessment of the proposed model.Figures 6-7 depict this phenomenon.It is essential to observe that as more epochs pass, the accuracy metrics improve, and the Loss numbers drop.When val_loss goes down, it means the model has been trained correctly.After the outliers have been filtered out, the dataset of images is identified as normal or abnormal using a basic neural network design.
In Figure 8, the projected result is presented as a confusion matrix.Of the total input images, 99% were correctly labeled as expected, and 100% were labeled as abnormal.The suggested approach addresses limits in the same way that any expert system would.The training data plays a crucial role in the model's recital.Because of this, the algorithm flagged the intake of new information-a typical, healthy cardiac imageas suspicious.This may be an issue when the corpus lacks domain-specific examples.The temporal components of the picture are stripped away during the cleaning process since the suggested model only considers spatial features.As a result, features are occasionally misrepresented because of a failure to account for edges and differences in the local binary pattern of the pictures.
Interestingly, only a small amount of training can get the minimal Loss score down to 0.01.This theory helped reduce the input data length and illustrate the latent space model.Most activations also show a sparse hierarchical architecture, mainly when focusing on the spatial aspects.In addition, the encoder reduces the dimensionality of the complex representation before reconstructing the original picture during decoding.The model may get overfit during this adjustment by memorizing the training data.The suggested approach typically uses half the input variables to regulate how many neurons may be used in the core layer.By doing so, we can guarantee that the model is picking up on the most relevant and valuable aspects of the data we provide it.It is important to emphasize that the model is unsupervised because it makes no use of training labels.The system is da-ta-dependent since the buried layers' neurons are all driven by the available data.Therefore, the activation function causes various neurons to fire, resulting in a varied network output when the input characteristics change.Our technique had a tremendous overall classification performance but needed to be more accurate in a small percentage of data.
The table summarizes related studies along with their respective datasets and accuracy rates.Montaha et al. [20] conducted a study using the CBSI-DDSM dataset and achieved an accuracy of 98%.Zebari [41] also utilized the CBSI-DDSM dataset but achieved a slightly lower accuracy of 79.36%.Panambur et al. [21] obtained an accuracy of 87% using the same CB-SI-DDSM dataset.Sarker et al. [27] and Shen et al. [31] used the CBSI-DDSM dataset and achieved accu-   racies of 98%.Mallick et al. [20] employed the Kaggle Brain MRI dataset and achieved an accuracy of 93%.Similarly, Alsaif et al. [1] used the Kaggle Brain MRI dataset and achieved an accuracy of 97%.Finally, the proposed methodology utilized both the CBSI-DDSM and Kaggle Brain MRI datasets and achieved an impressive accuracy rate of 99%.Overall, the table highlights various studies, datasets, and their corresponding accuracy rates, providing an overview of the performance of different approaches in the field.

Conclusion and Future Work
We offer a method based on deep learning to find anomalies in medical pictures.Shortly, genetic algorithms will be used to improve the deep convolution neural network further.The properties of the sperm pictures enhance feature analysis.An algorithm is being developed to speed up the processing of massive datasets.

Figure 1
Figure 1 Outline of the Modeling Proposal where A and B are the domains of the original data and the latent manifold, respectively.This function maps the original image  onto a latent vector  of lower dimensionality ( ) Common abnormalities in medical images have been linked to severe illnesses.Segmentation and abnormal region classification are both necessary for accurate diagnosis of significant diseases.Segmentation and categorization of aberrant areas are essential for a precise assessment.The primary objective of the first classification stage is to identify aberrant pictures.The normal samples are used to train the model, which then attempts to produce consistent reconstructions of regular inputs with low anomaly scores and inconsistent reconstructions of aberrant pictures with high anomaly scores.The suggested model's design is depicted in Figure2.It involves incorporating semi-supervised learning via a memory-augmented deep auto-encoder.To learn representations in which the input and the output are identical, an AE makes use of feedforward neural networks.The sole components of a standard AE are the encoding and decoding units.The encoder is symbolized by the function en f  , B are the domains of the original data and the latent manifold, respectively.This function maps the original image  onto a latent vector  of lower dimensionality ( )

2
the importance of each prototype vector  in generating .The memory components and the latent query vector  are used to inform the construction of the weight vector .

Figure 3
Figure 3The UNet architecture for the image segmentation

Figure 3 The 1 :
Figure 3The UNet architecture for the image segmentation

Figure 5
Figure 5 Accuracy of the DACN model for Alzheimer's Disease Detection

Figure 6 (
Figure 6 (a) Accuracy and (b) Loss of the proposed model on the dataset CBSI-DDSM

8
Accuracy and (b) Loss of the proposed model on the dataset Brain MRI The Confusion matrix of the proposed model on (a) CBSI-DDSM dataset and (b) Brain MRI

Figure 6 (
Figure 6 (a) Accuracy and (b) Loss of the proposed model on the dataset CBSI-DDSM

Figure 7 (
Figure 7 (a) Accuracy and (b) Loss of the proposed model on the dataset Brain MRI

Figure 8 8
Figure 8 The Confusion matrix of the proposed model on (a) CBSI-DDSM dataset and (b) Brain MRI (a) (b) Figure 8 The Confusion matrix of the proposed model on (a) CBSI-DDSM dataset and (b) Brain MRI

Table 1
Comparasion of characteristics on existing work Common abnormalities in medical images have been linked to severe illnesses.Segmentation and abnormal region classification are both necessary for accurate diagnosis of significant diseases.Segmentation and categorization of aberrant areas are essential for a precise assessment.The primary objective of the first classification stage is to identify aberrant pictures.The normal samples are used to train the model, which then attempts to produce consistent reconstructions of reg- Common abnormalities in medical images have been linked to severe illnesses.Segmentation and abnormal region classification are both necessary for accurate diagnosis of significant diseases.Segmentation and categorization of aberrant areas are essential for a precise assessment.The primary objective of the first classification stage is to identify aberrant pictures.The normal samples are used to train the model, which then specified by the function de f , which maps  back to its innovative space.ˆ( , ) ( ( , ), ) M Ω characterizes the number of prototypical routes.The memory component, given a latent variable b, will produce a recovered latent representation in accordance with , �� represents the Grad-CAM activation map for the target class  , and  � characterizes the  -th feature map of the final convolutional layer.Calculate the guided backpropagation gradient for each pixel: represents the Grad-CAM activation map for the target class c, and A k characterizes the k-th feature map of the final convolutional layer.Calculate the guided backpropagation gradient for each pixel:

Table 2
Hyperparameter tuning for the proposed model

Table 3
Performance measures in terms of Learning rate

Table 4
Standard performance measures using various Learning rate

Table 3
Performance comparison with related work Transfer learning is used for actual feature learning without over-fitting when available training data is scarce.Deep CNN is taught to recognize certain anomalies in an image.They are used with GCAMs to construct anomaly detectors.Experimental data shows that the integrated strategy is superior to conventional and alternative deep learning strategies.After the database was split into testing and training sets, the generated algorithm was put through its paces regarding performance metrics.Performance metrics of 99.12% accuracy, 98.21% sensitivity, and 98.36% specificity on the CBSI-DDSM dataset and 99.35% accuracy, 98.10% sensitivity, and 98.87% specificity were obtained with a learning rate 0.0001.The suggested technique has been successfully validated on mammography and ultrasound pictures, indicating tremendous promise for helping clinicians discover local problems in medical imaging.Classifier performance may be measured in terms of accuracy, f-score, precision, and calculation time and is acquired through a testing procedure that follows the training.The experimental analysis results are superior to those of the current methods.