A Two-Step Unsupervised Learning Approach to Diagnose Machine Fault Using Big Data

The modern industrial sector requires an intelligent fault diagnosis system to ensure reliable and safe processingsince traditional methods require expert diagnosis, which consumes time and requires labor. Furthermore, diagnosticresults are influenced by the expert’s expertise and in-depth knowledge of the machine. The objective of this paper isto solve the manual intervention problem and improve the fault diagnosis. We propose a novel two-stage unsupervisedlearning algorithm based on artificial intelligence (AI) that learns fault features efficiently from raw vibration signals.To accomplish the aforementioned goal, we encapsulate the two-stage learning technique such as sparse filtering andRectified Linear Unit (ReLU) regression function. As a first step, we used a two-layer neural network sparse filteringprocedure to extract vibration signals’ features. Based on vibration signals, ReLU regression determines the healthcondition of the machine in the second phase. ReLU is a linear function that improves the performance of neural network training. Here we utilized a sigmoid and softmax regression function to compare the performance of ReLU. Thesigmoid function works well for binary classification, whereas softmax works well for multiclass classification. A database of motor-bearing vibration signals containing signals about four different health conditions of machines, suchas Inner race faults (IF), Outer race faults (OF), and rolling faults (RF). The sparse filter is evaluated on different inputand output dimensions, which significantly increases the learning accuracy. We classified the health condition usingReLU and achieved 93.8% accuracy, which is higher than sigmoid and softmax. Through the two-step learning process, machine fault diagnosis is enhanced, as well as big data is effectively handled.


Introduction
The advent of technology has endorsed the concept of big data, which makes it possible to collect huge amounts of data, such as medical data, software information, and so on. In the modern world, machines are more automated and efficient than ever before, and therefore health status monitoring is more difficult. To avoid disastrous accidents, such as environmental pollution, economic losses, etc., an effective diagnosis of the machine fault [13] is necessary. In the traditional approach, manual diagnosis is used to diagnose faults occurrence in machines, but it is time-consuming and prone to errors due to human observation. Afterward, the research moved to Condition Monitoring System (CMS) which effectively collects real-time data from sensors after a long-term operation to build the big data. The machine parameters (temperature, vibration, flow, frequency, status, etc.) are continuously monitored by various sensors every second to produce big data. Artificial Intelligence (AI) has demonstrated many breakthroughs in the building of smart technology for the future in numerous applications. Intelligent fault diagnosing based on artificial intelligence will be a promising tool to handle mechanical big data. It consists of three steps: signal acquisition, feature extraction, and fault classification. But this approach provides insensitive information which widely affects computational efficiency and diagnosing results. The proposed novel two-step unsupervised learning approach will improve fault diagnosis by improving error detection accuracy.

Related Work
Zhang et al. [21] presented a paper on the fault diagnosis on machine bearings where three algorithms are employed such as the Hidden Markov model (HMM), the adaptive fault prediction model, and Principal Component Analysis (PCA). PCA extracts the principal signal features from the raw data which is processed by HMM for health status assessment. In addition, the adaptive prognostic algorithm measures the degrading index of HMM to reduce component replacement. The algorithm provides an inefficient result even though it works on a real bearing dataset.
Chen et al. [3] proposed a method to determine the bearing fault using data fusion techniques with multiple sensors. The online sensing technologies recognize the incipient fault, which is evaluated by Principal Component Analysis (PCA) and Gaussian Mixture Model (GMMI). It evaluates the main variable that causes the fault, but there is no real-time bearing dataset to measure performance.
Fan et al. [6] proposed a paper to detect the bearing fault in the machine using the SVM and the Self-Regulating particle swarm method. It is discussed the fundamentals of multi-kernel least square support vector machines (MK-LS-SVMs) with the aim of identifying a classifier that can fuse multi-dimension features of empirical modes decomposition (EMD) with high generalization properties. The accuracy of SVM classification is limited by kernel parameters.
Zhang et al. [20] used the manifold learning method for fault diagnosis, where machine condition and rotational speed are monitored under stable loading self-Organizing Map (SOM) and Neighborhood Preserving Embedding (NPE) methods are used to measure bearing performance degradation. The NPE method was adopted to perform dimension reduction and classification of faults under varying working conditions. Even though the NPE and SOM measure the bearing degradation till it faces accuracy problem.
Devendiran et al. [5] tried a method that detects the root of faults and severity levels. In order to get continuous machine monitoring, it is necessary to detect and diagnose faults present in the system, determine the most probable cause of the fault, and assess the severity of the fault. The bearing and gearing component faults are analyzed and monitored progressively. During this survey the fault diagnosing still need improvement.
Li et al. [12] analyze the bearing fault by monitoring the vibration signal spectrum image. To diagnose the fault, the spectrum images are transformed via Fourier transform, 2D PCA, and a minimum distance method. It has limited training samples to determine the diagnosing efficiency.
Gligorijevic et al. [7] implement the fault diagnosis in the packing industry by monitoring the bearing condition thereby trying to improve the reliability of the machine. Vibration signals, statical pattern recognition, and wavelet transform are utilized to obtain an efficient result. However, they did not implement the process in a real environment.  [4] proposed a method to learn features from unlabeled data of CIFAR-10 and NORB datasets. The k means clustering algorithm is used to learn the single layer dense features. K-means clustering has the limitation that it cannot deal with noisy data.
Wu et al. [17] proposed a method for fault diagnosis using a feature vector extracted from a vibration signal that is decomposed using Ensemble Empirical Modes (EEM). In the EEMD, faults can be diagnosed accurately whether they are single faults or coupling faults, but it has trouble classifying multiple faults Chegini et al [2] introduced an autocorrelation function based filtering algorithm for vibration signals which is classified by energy variation among bearing data. Using the Empirical Wavelet Transform (EWT), Pearson's correlation coefficient is used to select the relevant and appropriate modes. But it suffers from time complexity. Ranzato et al. [15] presented an unsupervised learning method for invariant sparse features with fewlayered training samples that provides a reliable result in features detection. It was applicable for a small dataset only. Kuncan [10] discussed the combination of local binary patterns with a gray relational model for feature extraction and classifying bearing faults. With the help of these features, one can obtain statistical features from the signals in the 1-D-LBP plane, and finally, one can classify the vibrational signals by using a gray relational analysis (GRA) model. However, it has not yet been implemented in a real-time environment. Sohaib et al. [16] presented a hybrid feature model which classifies faults based on their classes. A combination of sparse stacked autoencoders (SAE) and Deep Neural Networks (DNNs) is used to diagnose fault severity. However, the scheme performs better than SVMs and BPNNs which take longer to compute. Hamadache et al. [8] a method was presented for diagnosing and detecting bearing faults in normal and load conditions of a rotating machine. This technique uses Absolute Value Principal Component Analysis (AVPCA), ProbPlot via Image Recognition using the AVPCA (IR-AVPCA). The AVPCA to extract eigenfaces, and the bases of the SSE were generated to detect and diagnose three kinds of bearing faults (outer-race fault (ORF), inner-race fault (IRF), and ball fault (BBF)). But still, need a performance improvement in fault detection.
Lei et al. [11] presented a distance estimation technique that uses six important raw signal features. Then it fed into an Adaptive Neuro-fuzzy interference (ANFISs) system to classify the faulty bearing. The system divides the large problem into multiple classes where multiple outputs are adopted using empirical mode decomposition. The accuracy of fault diagnosis is higher than that of individual ANFISs because AN-FISs enable both the identification of bearing abnormalities and the severity of faults, which is not possible with individual ANFISs. Still, time Complexity is a problem for the system.
Yu [18] proposed a technique to identify the bearing fault based on the Local and Nonlocal Preserving Projection (LNPP) concepts that discover the manifold's nonlocal and local structure. It easily detects the hidden low-dimensional information in a high-dimensional feature set. LNPP failed to use the fault diagnosing approach on other machine components and check their performance, resulting in performance issues.
Yu et al. [19] presented a method to learn about image representation through sparse coding. Sparse coding encodes local patches independently, accounting for high-order dependencies among patterns in a local image neighborhood. However, it faces difficulties when it comes to handling high-noise images.
Amar et al. [1] proposed a high-robust fault classifier at low SNR conditions using spectrum images. To achieve high precision bearing fault classification by combining neural network and empirical features, a modified neural network structure (LiftingNet) is proposed that enables adaptive extraction of hidden features from specific objects. LifitingNet has not been tested under different working conditions. We analyze the various fault diagnosis techniques and understand their advantages and disadvantages. According to the analysis, the previous works had some drawbacks due to data capacity or inefficient results. Based on these studies and analysis of various feature learning algorithms, we presented a novel framework for an intelligent fault identification system depicted in Figure 1. An overview of a novel framework for the intelligent diagnosis of faults, 81 Information Technology and Control 2022/1/51 During the second step, ReLU regression is used as the two-layer network. We trained the ReLU regression to classify the conditions of machine health. The complete diagnosis is carried out on the neural network without human intervention. 2 The feature extracted from this technique is superior to other conventional techniques because it efficiently utilized the data features for new fault prediction. The sparse filter physical interpretation is explored in feature learning thereby improvise the system's reliability.

Sparse Filter
The sparse filter is a simple algorithm that ignores the data distribution learning problem. It is simple to tune since there is only one hyper-parameter to tune as opposed to numerous parameters that must be tuned to get good results. Figure 3 depicts the architecture of sparse filtering, which is implemented in MATLAB to optimize a cost function for a normalized feature [14]. This method reduces feature density and provides a more accurate representation of the signal. In order to achieve a good performance, the sample needs to satisfy the following three principles: lifetime sparsity, high dispersal, and population sparsity. A number of active features are used to represent the population sparsely. Lifetime sparsity is a measure of the number of valid features used in  The rest of the paper is organized as follows: Section 3 elaborates the proposed Two-Step learning methodology. Section 4 shows the results and discussions. At last, the conclusion and future work are represented in Section 5.

Methodology
Unsupervised feature learning has been extensively employed in a variety of fields including instance progression, picture grouping, question disclosure, and talk affirmation, among others. Figure 2 depicts the block diagram of the proposed two-step unsupervised learning approach, which is a method that overcomes traditional approaches' drawbacks by training the artificial intelligence framework with unlabeled data. Our system implemented a two-phase learning process using a sparse filter to learn features from vibration signals and a ReLU regression to classify health conditions.

Sparse Filter
The sparse filter is a simple algorithm that ignores the data distribution learning problem. It is simple to tune since there is only one hyper-parameter to tune as opposed to numerous parameters that must be tuned to get good results. Figure 3 depicts the architecture of sparse filtering, which is implemented in MATLAB to optimize a cost function for a normalized feature [14].
information that assists in the fault diagnosis. . A high dispersion also characterizes the statistics of data, increasing a feature's ability to be generalized. The first sparse filtering set and get its weight framework . At that point, trained sparse filtering is utilized to catch the sample's local features. At last, these local features are the mean value to acquire the learned feature of each sample where is sample is a sample number. The sparse filter maps is represented in the Equation (1). Each row is normalized in sparse filtering norm. . (1) Sparse filter maps samples on their features using a weight matrix. is the feature set of sparse filtering i th test.

Figure 2
The block diagram of two-step learning The rest of the paper is organized as follows: Section 3 elaborates the proposed Two-Step learning methodology. Section 4 shows the results and discussions. At last, the conclusion and future work are represented in Section 5.

Methodology
Unsupervised feature learning has been extensively employed in a variety of fields including instance progression, picture grouping, question disclosure, and talk affirmation, among others. Figure 2 depicts the block diagram of the proposed two-step unsupervised learning approach, which is a method that overcomes traditional approaches' drawbacks by training the artificial intelligence framework with unlabeled data.
Our system implemented a two-phase learning process using a sparse filter to learn features from vibration signals and a ReLU regression to classify health conditions.

Figure 2
The block diagram of two-step learning

Figure 3
The architecture of sparse filtering

ReLU Regression
ReLUs provide each input with a specific output based on its values and provide the original input as raw data if it is below zero, else as raw data.   The architecture of sparse filtering data to extract the discriminative information that assists in the fault diagnosis.
A high dispersion also characterizes the statistics of data, increasing a feature's ability to be generalized. The first sparse filtering set and get its weight framework . At that point, trained sparse filtering is utilized to catch the sample's local features. At last, these local features are the mean value to acquire the learned feature of each sample where is sample is a sample number. The sparse filter maps is represented in the Equation (1). Each row is normalized in sparse filtering norm.
. (1) Sparse filter maps samples on their features using a weight matrix. is the feature set of sparse filtering i th test.

Figure 3
The architecture of sparse filtering

ReLU Regression
ReLUs provide each input with a specific output based on its values and provide the original input as raw data if it is below zero, else as raw data. If the input is ReLU function is non-linear, so it is easy for the errors to backpropagate and the multiple neuron layers are ReLU activated. As we have the training set and label set where and {1, 2....n} for ReLU regression. For each input, the model estimates each label's probability. Therefore, the speculation of ReLU regression provides a vector that corresponds to the input sample and k estimated probabilities for each label. The ReLu offers numerous benefits when compared with other activation functions such as computational ease, realistic sparsity, linear behavior, and train deep networks. ReLU is a simpler function that makes optimization easier and provides the best accuracy when compared to Softmax and Sigmoid. It avoids vanishing gradient problems and uses simpler mathematical operations than Softmax and Sigmoid.  Diagnosis results of sparse filtering for various output dimensions

ReLU Regression
ReLUs provide each input with a specific output based on its values and provide the original input as raw data if it is below zero, else as raw data. If the input is greater than zero, then the output is equally as the input. The ReLU Equation (2), (2) ReLU function is non-linear, so it is easy for the errors to backpropagate and the multiple neuron layers are ReLU activated. As we have the training set and label set where and {1, 2....n} for ReLU regression. For each input, the model estimates each label's probability. Therefore, the speculation of ReLU regression provides a vector that corresponds to the input sample and k estimated probabilities for each label. The ReLU offers numerous benefits when compared with other activation functions such as computational ease, realistic sparsity, linear behavior, and train deep networks. ReLU is a simpler function that makes optimization easier and provides the best accuracy when compared to Softmax and Sigmoid. It avoids vanishing gradient problems and uses simpler mathematical operations than Softmax and Sigmoid.  The architecture of sparse filtering

ReLU Regression
ReLUs provide each input with a specific output based on its values and provide the original input as raw data if it is below zero, else as raw data. If the input is input. The ReLU Equation (2), ReLU function is non-linear, so it is easy for the errors to backpropagate and the multiple neuron layers are ReLU activated. As we have the training set and label set where and {1, 2....n} for ReLU regression. For each input, the model estimates each label's probability. Therefore, the speculation of ReLU regression provides a vector that corresponds to the input sample and k estimated probabilities for each label. The ReLu offers numerous benefits when compared with other activation functions such as computational ease, realistic sparsity, linear behavior, and train deep networks. ReLU is a simpler function that makes optimization easier and provides the best accuracy when compared to Softmax and Sigmoid. It avoids vanishing gradient problems and uses simpler mathematical operations than Softmax and Sigmoid.

The Two-Step Learning Technique
The two-step learning technique for fault identification is explained in this Section. Initially, sparse filtering extracts raw features, and the learning statics are calculated based on three steps. The sparse filter gets trained and obtained its weight matrix . There are enormous samples available from raw data, so the sparse filter trained to extract local features. An average of all the local features from each sample is used to derive the learned feature. Let be the sparse filter input and is the sparse filter output. Finally, machine health is estimated using a trained ReLU regression method using a random training set .

Experimental Results
We evaluated the performance of the proposed twostep unsupervised learning approach through a real-time bearing dataset obtained from Case Western Reserve University's big data  Diagnosis results based on different segment numbers

The Two-Step Learning Technique
The two-step learning technique for fault identification is explained in this Section. Initially, sparse filtering extracts raw features, and the learning statics are calculated based on three steps. The sparse filter gets trained and obtained its weight matrix . There are enormous samples available from raw data, so the sparse filter trained to extract local features. An average of all the local features from each sample is used to derive the learned feature. Let be the sparse filter input and is the sparse filter output. Finally, machine health is estimated using a trained ReLU regression method using a random training set .

Experimental Results
We evaluated the performance of the proposed two-step unsupervised learning approach through a real-time bearing dataset obtained from Case Western Reserve University's big data [9] analysis. The vibration signals obtained from the motor drive point in a test rig under various conditions: Inner race fault (IF), Outer race fault (OF), Rolling fault (RF). Normal condition. For different condition, the vibration signal dataset is prepared based on different methods which used 12kHZ as sample frequency. The evaluation results of the ReLU regression function is compared with the other regression functions (softmax, sigmoid). The proposed model is developed in the MATLAB 2018a software model. The experimental PC configuration: Intel i5 -has four different loads with ten bearing healthy conditions have one class with a different load of health conditions. The big dataset contains thousands of samples. These samples are categorized based on their health condition at different loads. Therefore, the classified data points are organized in a class-wise manner.

Figure 6
Bearing dataset fault segment signal Figure 4 illustrates that a different input and output dimension is applied for evaluating the performance of sparse filtering. During testing, the accuracy of feature learning from machine fault is increased widely. İt showed that the sparse filtering has the capability to learn more sensitive features from the fault thereby enhanced the learning ability which is depicted in Figure 5. The ReLU performance is analyzed and depicted in Figure 6. The confusion matrix for the regression function is calculated and depicted in Figure 7. It represented that the accuracy of ReLU (93.8%) is higher than the other regression methods (Softmax-81.3% and Sigmoid 87.5%) under normal conditions. The test accuracy of the regression function for bearing dataset under normal conditions is depicted in Table 1. It has been shown that the ReLU does not have vanishing gradient problems. .

Table 1
Testing accuracy of bearing dataset using three regression functions in (%) RAM. In Figure 6, it shows the segment of the fault signal of motor bearing datasets. The vibration signal is the main cause to generate the motor bearing dataset. The dataset has four different loads with ten bearing healthy conditions have one class with a different load of health conditions. The big dataset contains thousands of samples. These samples are categorized based on their health condition at different loads. Therefore, the classified data points are organized in a class-wise manner.

Figure 5
Diagnosis results based on different segment numbers

The Two-Step Learning Technique
The two-step learning technique for fault identification is explained in this Section. Initially, sparse filtering extracts raw features, and the learning statics are calculated based on three steps. The sparse filter gets trained and obtained its weight matrix . There are enormous samples available from raw data, so the sparse filter trained to extract local features. An average of all the local features from each sample is used to derive the learned feature. Let be the sparse filter input and is the sparse filter output. Finally, machine health is estimated using a trained ReLU regression method using a random training set .

Experimental Results
We evaluated the performance of the proposed two-step unsupervised learning approach through a real-time bearing dataset obtained from Case Western Reserve University's big data  Figure 6, it shows the segment of the fault signal of motor bearing datasets. The vibration signal is the main cause to generate the motor bearing dataset. The dataset has four different loads with ten bearing healthy conditions have one class with a different load of health conditions. The big dataset contains thousands of samples. These samples are categorized based on their health condition at different loads. Therefore, the classified data points are organized in a class-wise manner.

Figure 6
Bearing dataset fault segment signal Figure 4 illustrates that a different input and output dimension is applied for evaluating the performance of sparse filtering. During testing, the accuracy of feature learning from machine fault is increased widely. İt showed that the sparse filtering has the capability to learn more sensitive features from the fault thereby enhanced the learning ability which is depicted in Figure 5. The ReLU performance is analyzed and depicted in Figure 6. The confusion matrix for the regression function is calculated and depicted in Figure 7. It represented that the accuracy of ReLU (93.8%) is higher than the other regression methods (Softmax-81.3% and Sigmoid 87.5%) under normal conditions.
The test accuracy of the regression function for bearing dataset under normal conditions is depicted in Table 1. It has been shown that the ReLU does not have vanishing gradient problems. .

Table 1
Testing accuracy of bearing dataset using three regression functions in (%)

Figure 6
Bearing dataset fault segment signal Figure 4 illustrates that a different input and output dimension is applied for evaluating the performance of sparse filtering. During testing, the accuracy of feature learning from machine fault is increased widely. İt showed that the sparse filtering has the capability to learn more sensitive features from the fault thereby enhanced the learning ability which is depicted in Figure 5. The ReLU performance is analyzed and depicted in Figure 6. The confusion matrix for the regression function is calculated and depicted in Figure 7. It represented that the accuracy of ReLU (93.8%) is higher than the other regression methods (Softmax-81.3% and Sigmoid 87.5%) under normal conditions. The test accuracy of the regression function for bearing dataset under normal conditions is depicted in Table 1. It has been shown that the ReLU does not have vanishing gradient problems. It can train faster than sigmoid due to less numerical computation. The experimental results show ReLU regression effectively identifies the gear with different fault types and different health conditions. A summary of recall, precision, and F1-Score values with test data sets for all the three regression functions Confusion matrix for Bearing dataset

Conclusion
In this paper, we proposed an intelligent fault diagnosing approach in an unsupervised manner using big data. The traditional approach depends on expert knowledge for detecting machine faults and lacks accuracy due to hidden features. An intelligent fault identification system finds difficulties in extracting sensitive information from machine faults. The two-step unsupervised feature learning algorithm overcomes the aforementioned problems by implementing sparse filtering and ReLU regression. The sparse filter is responsible for extracting the sensitive data from bearing faults those data will be classified by ReLU. We evaluated the performance of sparse filtering and ReLU on bearing fault detection which shown an efficient result. ReLU regression function obtains 93.8 % accuracy in the fault classification under different conditions (Inner race fault (IF), Outer race fault (OF), Rolling fault (RF), and Normal). The accuracy and learning progress of the proposed method is superior to the traditional approaches. In future work, we are planning to change the learning method thereby improving the accuracy of unsupervised learning. are given in Table 2. ReLU based regression method performs the best with 95%, 93.75%, and 93.65% of F1-Score, recall, and precision respectively. Thus our proposed unsupervised approach will be the promising solution for machine fault detection in an effective manner.

Conclusion
In this paper, we proposed an intelligent fault diagnosing approach in an unsupervised manner using big data. The traditional approach depends on expert knowledge for detecting machine faults and lacks accuracy due to hidden features. An intelligent fault identification system finds difficulties in extracting sensitive information from machine faults. The twostep unsupervised feature learning algorithm overcomes the aforementioned problems by implementing sparse filtering and ReLU regression. The sparse filter is responsible for extracting the sensitive data from bearing faults those data will be classified by ReLU. We evaluated the performance of sparse filtering and ReLU on bearing fault detection which shown an efficient result. ReLU regression function obtains 93.8 % accuracy in the fault classification under different conditions (Inner race fault (IF), Outer race fault (OF), Rolling fault (RF), and Normal). The accuracy and learning progress of the proposed method is superior to the traditional approaches. In future work, we are planning to change the learning method thereby improving the accuracy of unsupervised learning.