Unsupervised Anomaly Detection of Industrial Images Based on Dual Generator Reconstruction Networks

At present, deep learning techniques are increasingly utilized in computer vision and anomaly detection. To address the limitations of inadequate reconstruction capability and subpar performance in reconstruction-based anomaly detection, this study enhances the existing algorithm and introduces an unsupervised anomaly detection of industrial images algorithm based on dual generator reconstruction networks-DGRNet. The network consists of two generators and a discriminator, introducing a widely recognized denoising diffusion probabilistic model (DDPM) as one of the generators, an autoencoder (AE) as the other generator, and a decoder as the discriminator. The model is tested on the MVTec AD dataset, and in the case of no additional training data, the anomaly detection AUC result of DGRNet exceeds the baseline method based on reconstruction by 19.6 percentage points. The experimental results show that DGRNet can improve the detection performance in the anomaly detection algorithm based on unsupervised and reconstructed networks


Introduction
Anomaly detection is an important branch of machine learning, a technology for detecting abnormal situations and mining illogical data and is widely applied in different fields such as credit card fraud, insurance or healthcare, cybersecurity intrusions, safety-critical systems, industrial big data, abnormal behaviors, image, and video processing.In image anomaly detection, there are mainly three aspects of applications, namely defect detection, medical image analysis, and hyperspectral image processing.Early anomaly detection was mostly applied in the field of data mining, and relatively traditional algorithms were used.Traditional algorithms are generally supervised learning methods, which can be divided into three categories: similarity measure-based, statistical, and probability-based, and linear model-based [9].In recent years, with the rapid development of deep learning methods, anomaly detection has been introduced to the field of image processing to solve problems such as target detection which has rare sample [11].In anomaly detection tasks, it is usually difficult to obtain well-labeled anomalous samples, so the research on unsupervised anomaly detection in deep learning had received more attention.Among unsupervised anomaly detection algorithms, reconstruction-based algorithm is a natural idea.This kind of algorithms generally include algorithms based on Autoencoders (AE) [16] and Generative Adversarial Networks (GAN) [7], as well as algorithms that combine AE and GAN.In the case of AE-based algorithms, during the training stage, only normal samples are used and during the inference stage, the difference between the normal image and its reconstructed image is very small, while the difference between the abnormal image and its reconstructed image is large, and the size of the difference can be used to judge whether the image is abnormal.Mei et al. [12] reconstructed images by dividing them into blocks and used denoising AE to locate anomalies in texture images.GAN, as a generation network, can reconstruct clearer images in algorithms based on GAN.AnoGAN was proposed by Schlegl et al. [19], which directly utilizes GAN iterative optimization to reconstruct images.However, the main problem of this method is its time-consuming during model inference.To solve this issue, EBGAN [23] and other models were proposed.Xiao Du et al. introduced a correction branch to modify the original reconstruction results obtained from the reconstructed network during the test to solve the problem that it is difficult to distinguish small anomalies in the reconstruction-based network [5].Zhou et al. proposed to reconstruct the image by leveraging the structure-texture correspondence [24].Farady et al. proposed Hierarchical Image Transformation and Multi-level Features (HIT-MiLF) modules for an anomaly detection network to adapt to perturbances from novelties in testing images [6].In the algorithms combining AE and GAN, a discriminator is added after the AE, which is used to distinguish between the reconstructed image and the input image.During training, adversarial training is adopted to enhance the reconstruction ability.The ALOOC model proposed by Sabokrou et al. [17], the ALOOC model consists of two parts: a denoising auto-encoder and a convolutional neural network classifier.These two networks work together to improve the performance of anomaly detection.The GANomaly [1] network structure includes a generator and a discriminator, where the generator is in the form of an encoder-decoder-encoder.It judges whether the input image is an abnormal sample by comparing the potential information of the input image and the reconstructed image.Building on GANomaly, the Skip-GANomaly [2] model was proposed, which adds a skip connection structure to enhance the model's reconstruction ability.However, this model still has the potential to miss detections.When performing surface defect detection, some small-scale defects are easily reconstructed, resulting in small abnormal scores, thereby causing missed detections.For the past few years, diffusion model (Denoising Diffusion Probabilistic Model, DDPM) is emerging as an unsupervised generative model is a type of unsupervised generative mode.It was first proposed in 2015, but only proposed adding noise to data through a Markov chain during the diffusion process, as well as learning how to reconstruct from noise to data samples [20].In 2020, DDPM was implemented and gradually became a new hotspot in the generation field [8], demonstrating its powerful generation capabilities, and Dhariwal proved that DDPM can surpass GAN [4].In the recent wave of AI painting in the field of computer vision, DDPM played a crucial role.In popular AI painting models, OpenAI's DALL•E2 [13] can generate high-definition images just by inputting text; Google's Image [18] is also an AI painting model that can generate images that meet requirements through text descriptions; Stability AI's Stable Diffusion [14] is also such a model, and the generated images can already be used commercially.These models all demonstrate the powerful image generation capabilities of DDPM.In the field of anomaly detection, AnoDDPM [22] improved DDPM by replacing Gaussian noise with multiscale simplex noise to capture abnormal areas, without the need for stable training on large datasets.Although the algorithm based on DDPM is not perfect for anomaly detection tasks, DDPM's powerful generative ability is helpful in the field of anomaly detection.We can utilize the powerful generative ability of DDPM to improve the reconstruction ability of reconstruction-based networks in anomaly detection.The proposed algorithm in this paper is an unsupervised anomaly detection algorithm based on reconstruction.To enhance the reconstruction capability of the network model, this paper proposes an unsupervised image anomaly detection algorithm based on a dual-generator reconstruction network DGRNet, and experiments on industrial images.The algorithm includes two generators, a DDPM generator and an AE generator, and improvements are made to both generators.Each generator reconstructs the input image, and the final reconstructed image is obtained by fusing the features of the two generators.Finally, it is input into a discriminator network for anomaly detection.

Overview
The proposed in this paper aims to enhance the performance of anomaly detection algorithms based on reconstruction.Compared with existing reconstruction-based networks, the network proposed in this paper has two generators.The overall structure of the network includes two generators and a discriminator, one generator is DDPM, another generator is AE, and a decoder as the discriminator.The generation effect of AE on industrial dataset is not very good, especially when conducting surface defect detection.Some small-scale defects are easily reconstructed, resulting in a small anomaly score, thereby causing missed detection.This paper improves AE by adding attention modules in the encoder of AE, and adding a Skip Connection structure between the last layer of the encoder and the decoder.This article takes the popular DDPM in the generation model as another generation branch, the addition of DDPM improves the ability of network generation and the parameterization of the noise addition steps in the DDPM diffusion process is performed.The overall network structure is shown in Figure 1.Images 0 x is put into two genera-

DDPM Generator
DDPM is a powerful generative model that consists of two processes: forward diffusion process and inverse reconstruction process.The diffusion process refers to the process of adding noise to images, while the reconstruction is the process of denoising images from noisy images to clear original images.In the diffusion process, given the initial data distribution x 0 ~ q(x) , noise is continuously added to the distribution, and the standard deviation of the noise is a fixed value, and the mean is determined by the fixed value and the current data at time t.This process comes from a Markov chain, and as t increases, the final data distribution T X becomes an isotropic Gaussian distribu- tion.The reconstruction process is an inverse diffusion process, recovering original data from Gaussian noise.However, it is unable to fit the distribution gradually, requiring the construction of a parameter distribution for estimation.The reconstruction process remains a Markov chain, where a U-Net network [15] can be trained during the reconstruction process to predict the denoising process.As shown in Figure 2, noise is added to the input image 0 x , which can be completed in one step, that is, T steps of noise addition are completed once.Then, a U-Net network is trained to pre-

Figure3
Reconstruction result of DDPM generator on leather image

AE Generator
When reconstructing abnormal image samples using Autoencoders (AE), it is prone to reconstructing the easily abnormal parts.To enhance AE's reconstruction capability, a CBAM module [21], as shown in Figure 4, was incorporated in the encoder.

Figure4
CBAM module added to the encoder

AE Generator
When reconstructing abnormal image samples using Autoencoders (AE), it is prone to reconstructing the easily abnormal parts.To enhance AE's reconstruction capability, a CBAM module [21], as shown in Figure 4, was incorporated in the encoder.
The CBAM module combines channel attention and spatial attention, enabling the network to focus more on normal image information through the CBAM module.In the inference stage, the image with defective parts can be better reconstructed into a normal image.The decoder can reduce the difficulty of reconstruction through Skip Connection; however, if Skip Connection is added to each layer, it will reconstruct all defects.To address the issue of all defects being reconstructed, this study only adds Skip Connection to the bottom layer, thereby making it less likely for defects to be easily reconstructed during the reconstruction process.

Feature Fusion
The network structure proposed in this paper includes two generators.During inference, the reconstructed results of the two generators need to be feature fused and then input into the discriminator for discrimination.
Since the semantic information of the corresponding channels in the generated results of the two generators is similar, direct feature addition is chosen for fusion, which can retain more information of the generated images and then input them into the discriminator.

Training
In this paper, a DDPM generator is trained on normal data samples.During the training, the input images are all normal images, and partial Markov chain is used to denoise the images during the forward diffusion process ( ) The CBAM module combines channel attention and spatial attention, enabling the network to focus more on normal image information through the CBAM module.In the inference stage, the image with defective parts can be better reconstructed into a normal image.The decoder can reduce the difficulty of reconstruction through Skip Connection; however, if Skip Connection is added to each layer, it will reconstruct all defects.To address the issue of all defects being reconstructed, this study only adds Skip Connection to the bottom layer, thereby making it less likely for defects to be easily reconstructed during the reconstruction process.

Feature Fusion
The network structure proposed in this paper includes two generators.During inference, the reconstructed results of the two generators need to be feature fused and then input into the discriminator for discrimination.Since the semantic information of the corresponding channels in the generated results of the two generators is similar, direct feature addition is chosen for fusion, which can retain more information of the generated images and then input them into the discriminator.

Training
In this paper, a DDPM generator is trained on normal data samples.During the training, the input images are all normal images, and partial Markov chain is used to denoise the images during the forward diffusion process ( ) ) β from 4 10 − to 0.02.The reconstruction process takes θ as a parameter and takes noise-added images as input and samples according to the following formula: The CBAM module combines channel attention and spatial attention, enabling the network to focus more on normal image information through the CBAM module.In the inference stage, the image with defective parts can be better reconstructed into a normal image.The decoder can reduce the difficulty of reconstruction through Skip Connection; however, if Skip Connection is added to each layer, it will reconstruct all defects.To address the issue of all defects being reconstructed, this study only adds Skip Connection to the bottom layer, thereby making it less likely for defects to be easily reconstructed during the reconstruction process.

Feature Fusion
The network structure proposed in this paper includes two generators.During inference, the reconstructed results of the two generators noise by defining a linear schedule, for 1  from 4 10 − to 0.02.The reconstruction process takes  as a parameter and takes noise-added images as input and samples according to the following formula: ,  can be used to implement by a structure similar to U-Net, make the DDPM generator can learn how to put a image of a containing noise denoising to normal image.Objective function Using simplified objective function s L [8]: the parameters   can be learned through the µ can be used to implement by a structure similar to U-Net, make the DDPM generator can learn how to put a image of a containing noise denoising to normal image.Objective function Using simplified objective function s L [8]:  can be used to implement by a structure similar to U-Net, make the DDPM generator can learn how to put a image of a containing noise denoising to normal image.Objective function Using simplified objective function s L [8]: the parameters   can be learned through the network.
When training AE generator, it is also trained on the normal data samples, followed by adversarial training with the discriminator.
Both training processes are unsupervised.Due to the contribution of adversarial loss, context loss, and latent loss to the training objective, the three loss values are weighted and combined during training in this paper.
In order to achieve the best reconstruction capability, the adversarial loss mentioned by Goodfellow et al. [7] is utilized to ensure that the generative network G can reconstruct the input image x as close to the real sample x as possible.Moreover, the discriminator D is trained to distinguish between real samples and fake generated samples.This loss is denoted as adv L , as shown in Formula ( 4): ( ) ( ) ( )    (3) the parameters θ ε can be learned through the network.
When training AE generator, it is also trained on the normal data samples, followed by adversarial training with the discriminator.Both training processes are unsupervised.Due to the contribution of adversarial loss, context loss, and latent loss to the training objective, the three loss values are weighted and combined during training in this paper.
In order to achieve the best reconstruction capability, the adversarial loss mentioned by Goodfellow et al. [7] is utilized to ensure that the generative network G can reconstruct the input image x as close to the real sample x as possible.Moreover, the discriminator D is trained to distinguish between real samples and fake generated samples.This loss is denoted as adv L , as shown in Formula (4):  can be used to implement by a structure similar to U-Net, make the DDPM generator can learn how to put a image of a containing noise denoising to normal image.Objective function Using simplified objective function s L [8]: the parameters   can be learned through the network.
When training AE generator, it is also trained on the normal data samples, followed by adversarial training with the discriminator.
Both training processes are unsupervised.Due to the contribution of adversarial loss, context loss, and latent loss to the training objective, the three loss values are weighted and combined during training in this paper.
In order to achieve the best reconstruction capability, the adversarial loss mentioned by Goodfellow et al. [7] is utilized to ensure that the generative network G can reconstruct the input image x as close to the real sample x as possible.Moreover, the discriminator D is trained to distinguish between real samples and fake generated samples.This loss is denoted as adv L , as shown in Formula ( 4): ( ) ( ) ( ) In order to learn the context information of normal samples, the input image and reconstructed image are L 1 normalized to ensure that the reconstructed graph is similar to the context information of normal samples This loss is represented as con L : In order to learn the context information of normal samples, the input image and reconstructed image are 1 L normalized to ensure that the reconstructed graph is similar to the context information of normal samples This loss is represented as con L : adv L and con L enables the model to generate realistic and context-similar images, in order to enable the model to reconstruct the potential distribution of the normal image from the input image x to be consistent, this paper uses the last convolution layer of the discriminator D to extract x and x features to reconstruct its potential representation.The loss is denoted as lat L : ( ) ( ) The weighted sum of the three losses is taken as the total training objective L , as shown in formula (7), where, adv  , con  and lat  are the weight parameters:

Inference
In inference stage, image is input into the DDPM generator and the AE generator.After entering the DDPM generator, part of the Markov chain is used for noise addition during the diffusion process.In the reconstruction process, the denoised reconstruction is performed on the noisy image, allowing the original image to be reconstructed with the same effect.Defective images are reconstructed to be defect-free.Similarly, when an image enters the AE generator, a reconstructed image is obtained.Two reconstructed images are obtained from the two generators.These two images are fused by feature-wise addition and input into the discriminator for image identification by calculating the difference between the latent representation and the difference between the reconstructed image and the original image.

Dataset
The network proposed in this paper has been experimented and evaluated on the MVTec AD dataset [3], which contains images of industrial production, including 5334 high-resolution color images, five categories of texture images from different fields, and ten categories of object structure images.Each category consists of a training set and a testing set.The training set only contains normal samples, while the testing set contains both normal and defective samples.The resolution of the images is between 700 and 1024, and the resolution of the images is reduced to 256 in the experiment.Figure 5 shows an abnormal sample of each category of images.The first row represents part of the abnormal texture image samples, and the second and third rows represent part of the abnormal object structure image samples.
(5) adv L and con L enables the model to generate realistic and context-similar images, in order to enable the model to reconstruct the potential distribution of the normal image from the input image x to be consistent, this paper uses the last convolution layer of the discriminator D to extract x and x features to reconstruct its potential representation.The loss is denoted as lat L : In order to learn the context information of normal samples, the input image and reconstructed image are 1 L normalized to ensure that the reconstructed graph is similar to the context information of normal samples This loss is represented as con L : adv L and con L enables the model to generate realistic and context-similar images, in order to enable the model to reconstruct the potential distribution of the normal image from the input image x to be consistent, this paper uses the last convolution layer of the discriminator D to extract x and x features to reconstruct its potential representation.The loss is denoted as lat L : ( ) ( ) The weighted sum of the three losses is taken as the total training objective L , as shown in formula (7), where, adv  , con  and lat  are the weight parameters:

Inference
In inference stage, image is input into the DDPM generator and the AE generator.After entering the DDPM generator, part of the Markov chain is used for noise addition during the diffusion process.In the reconstruction process, the denoised reconstruction is performed on the noisy image, allowing the same effect.Defective images are reconstructed to be defect-free.Similarly, when an image enters the AE generator, a reconstructed image is obtained.Two reconstructed images are obtained from the two generators.These two images are fused by feature-wise addition and input into the discriminator for image identification by calculating the difference between the latent representation and the difference between the reconstructed image and the original image.

Dataset
The network proposed in this paper has been experimented and evaluated on the MVTec AD dataset [3], which contains images of industrial production, including 5334 high-resolution color images, five categories of texture images from different fields, and ten categories of object structure images.Each category consists of a training set and a testing set.The training set only contains normal samples, while the testing set contains both normal and defective samples.The resolution of the images is between 700 and 1024, and the resolution of the images is reduced to 256 in the experiment.Figure 5 shows an abnormal sample of each category of images.The first row represents part of the abnormal texture image samples, and the second and third rows represent part of the abnormal object structure image samples.

(6)
The weighted sum of the three losses is taken as the total training objective L, as shown in formula (7), where, adv λ , con λ and lat λ are the weight parameters: In order to learn the context information of normal samples, the input image and reconstructed image are 1 L normalized to ensure that the reconstructed graph is similar to the context information of normal samples This loss is represented as con L : adv L and con L enables the model to generate realistic and context-similar images, in order to enable the model to reconstruct the potential distribution of the normal image from the input image x to be consistent, this paper uses the last convolution layer of the discriminator D to extract x and x features to reconstruct its potential representation.The loss is denoted as lat L : ( ) ( ) The weighted sum of the three losses is taken as the total training objective L , as shown in formula (7), where, adv  , con  and lat  are the weight parameters:

Inference
In inference stage, image is input into the DDPM generator and the AE generator.After entering the DDPM generator, part of the Markov chain is used for noise addition during the diffusion process.In the reconstruction process, the denoised reconstruction is performed on the noisy image, allowing the same effect.Defective images are reconstructed to be defect-free.Similarly, when an image enters the AE generator, a reconstructed image is obtained.Two reconstructed images are obtained from the two generators.These two images are fused by feature-wise addition and input into the discriminator for image identification by calculating the difference between the latent representation and the difference between the reconstructed image and the original image.

Dataset
The network proposed in this paper has been experimented and evaluated on the MVTec AD dataset [3], which contains images of industrial production, including 5334 high-resolution color images, five categories of texture images from different fields, and ten categories of object structure images.Each category consists of a training set and a testing set.The training set only contains normal samples, while the testing set contains both normal and defective samples.The resolution of the images is between 700 and 1024, and the resolution of the images is reduced to 256 in the experiment.Figure 5 shows an abnormal sample of each category of images.The first row represents part of the abnormal texture image samples, and the second and third rows represent part of the abnormal object structure image (7)

Inference
In inference stage, image is input into the DDPM generator and the AE generator.After entering the DDPM generator, part of the Markov chain is used for noise addition during the diffusion process.In the reconstruction process, the denoised reconstruction is performed on the noisy image, allowing the original image to be reconstructed with the same effect.Defective images are reconstructed to be defect-free.Similarly, when an image enters the AE generator, a reconstructed image is obtained.Two reconstructed images are obtained from the two generators.These two images are fused by feature-wise addition and input into the discriminator for image identification by calculating the difference between the latent representation and the difference between the reconstructed image and the original image.

Dataset
The network proposed in this paper has been experimented and evaluated on the MVTec AD dataset [3], which contains images of industrial production, including 5334 high-resolution color images, five categories of texture images from different fields, and ten categories of object structure images.Each category consists of a training set and a testing set.The training set only contains normal samples, while the testing set contains both normal and defective samples.The resolution of the images is between 700 and 1024, and the resolution of the images is reduced to 256 in the experiment.Figure 5 shows an abnormal sample of each category of images.The first row represents part of the abnormal texture image samples, and the second and third rows represent part of the abnormal object structure image samples.

Implementation Details
The AE generator includes an encoder and a decoder.In the encoder structure, this paper adds CBAM attention module to improve the performance of the network, so that the network can pay more attention to the normal image information.To retain global and local image information, the Skip Connection structure in U-net network is introduced to enhance the network's reconstruction capability, but to avoid the network to reconstruct more defects, the Skip Connection structure is only added in the last layer of the network.During training, set epoch as 50 and the batch size as 100, use Adam optimizer [10] to optimize the target L, set the learning rate as  The results are shown in Tables 1-2; Table 1 shows the results for texture images and Table 2 for object images.
The evaluation metric used is the area under the ROC curve (AUC).From Table 1, it can be seen the AUC for texture images, and from Table 2, it can be seen the AUC for object structure images.From these two tables, it can be seen the improvement of the AE gener-

Experiments Results
The model was evaluated using AUC, which is a performance metric for assessing anomaly detection, the value of AUC ranges from 0.1 to 1.0, A higher value indicates better model performance.In this study, we compared DGRNet with AnoGAN, GANomaly and SkipGANomaly models.The results are shown in Tables 3-4.We divided the dataset into two categories: texture and object structure, Table 3 is the results for texture images and  Taking the leather dataset as an example, Figure 6 displays the reconstruction results of DGRNet model on the leather dataset.

Conclusion
The proposed dual-generator reconstruction network in this paper is a novel algorithm based on the reconstruction network and has been experimented on industrial images.This algorithm is an unsupervised method consisting of two generators and a discriminator.One of the generators utilizes the powerful generation capability of the DDPM to enhance the reconstruction ability of the reconstruction network, while the other generator is an AE with Skip Connection and attention mechanisms to improve its generation capability.The DDPM has recently been garnering attention but has not been widely applied in the field of industrial image anomaly detection.This network applies DDPM to the field of anomaly detection, leveraging its powerful generation capability.In this paper, two generators are used to reconstruct the input image and the reconstructed images of the two generators are fused to improve the generating ability of the network.The algorithm has been verified on the MVTec AD dataset, showing significant improvements compared to the baseline network's detection results on the same dataset.
Due to the lack of computing power in the experimental environment, the number of iterations and the training batch size set in the experiment of training DDPM generator are small, so the generation ability of DDPM may not be fully developed, and there is still a certain gap between the results of the best industrial image anomaly detection algorithms.In the following work, DDPM generator will be further improved to give full play to its generation ability and improve its training speed.Relevant experiments also have done on other data sets, such as breast ultrasound data set, but the results are not very satisfactory, which may be related to the low quality of ultrasound images themselves.After that, image preprocessing and network structure optimization may improve the detection performance on this kind of data set.On the data set of fabric texture class, the performance is very good, and it can be applied to fabric defect detection in the future, but the detection effect on object structure images is significantly improved.The next step will also be to improve this problem.

Figure 1
Figure 1 Overall structure diagram of DGRNet

tx
is started, ultimately the reconstructed image 0 x is generated.At the training stage of the DDPM generator, train on normal samples, and use partially long Markov chains for the diffusion process.In the inference stage, reconstruct images with added noise, and abnormal samples to be reconstructed into normal images, normal samples to be reconstructed into their original images.Figure3shows the reconstruction results of the DDPM generator for the defective leather dataset.Figure3(a) represents the input defective original image, Figure 3(b) is the reconstructed image of DDPM generator for Figure 3(a).
Figure 3(c) is the residual image of Figure 3(a) and Figure (b).
Figure 3(d) is the residual heatmap, and Figure 3(e) is the ground truth.The defective parts in the image can be observed in Figures 3(c)-(d), but they are not very prominent.

Figure 4
Figure 4 CBAM module added to the encoder λ , lat λ of L are selected as 1, 40 and 1 respective- ly after the experiment.During the training of DDPM generator, use the improved U-Net in Dhariwal et al.[4] to train the de-noising model, and use the Adam optimizer for training.The learning rate was set as 4 1e − , the batch size was 32, and the epochs was 100.Considering that full-length Markov chain diffusion is not required for anomaly detection based on reconstruction, the number of steps is parameterized and the partial length Markov chain is used for diffusion.

Figure 5
Figure 5 Display of defect images in MVTec AD dataset

Figure 6 (
a) represents the input defective original image, Figure 6(b) is the reconstructed image for Figure 6(a), Figure 6(c) is the residual image of Figures 6(a)-(b).

Figure 6 (
d) is the residual heatmap, and Figure6(e) is the ground truth.The defective parts in the image can be observed in Figures6(c)-(d).Compared Figure3that with the re-

Figure 6
Figure 6 Reconstruction results of leather dataset through DGRNet model (a) input image (b) reconstruction image (c) residual image (d) residual heatmap (e) ground truth Information Technology and Control 2024/2/53 340

Table 1
Comparison of AUC results of texture images in ablation experiments

Table 2
Comparison of AUC results of object structure images in ablation experiments

Table 3
AUC results of GANomaly, SkipGANomaly, and the model DGRNet in this paper of texture images in MVTec AD dataset Table4 AUC results of GANomaly, SkipGANomaly, and the model DGRNet in this paper of object images in MVTec AD dataset