An Efficient Technique for Disease Prediction by Using Enhanced Machine Learning Algorithms for Categorical Medical Dataset

we have shown the novelty in every stage of the implementation, such as feature selection, feature extraction, and the final prediction algorithm stage. The proposed method is compared with the existing technique on the measure of precision, Normalized Mutual Information, execution time, recall and accuracy. Here we conclude with the solution having more accuracy for all kind of categorical datasets which includes both small and large scale datasets.


Introduction
Due to the advancements in the standard of living, the occurrence of chronic disease has increased significantly. It is essential to predict these diseases at an early stage, which further helps to reduce the risk of developing chronic diseases. So the concept of disease prediction becomes famous and propagates among the researchers. Since the medical data is of wide range, the process of disease prediction involves vast data. In recent years, the clustering and classification of the categorical data play a pivotal role in the field of data mining, especially in the medical field. Training, Extrapolation, and prediction [1] are the types of function learning from data. Learning from the categorical data [50] is a new effort; at the same time, the numbers of absolute values are conceivable and become exponential [33]. The learning models of the categorical medical datasets used so for are Deep Neural Network (DNN), Gaussian Process Latent Variable Model (GPLVM), Long-short term memory (LSTM) and Categorical Latent Gaussian Process (CLGP). The neural network has achieved the dimension reduction process [27]. Input vectors have constructed the conversion from high dimensional data to low dimensional data. Recent developments are derived based on the sample by CLGP. Nowadays, the Gaussian process established models of the latent variable have created a wide-range of interest in the community of machine learning concerning the capacity.
It reproduces the other confirmation around the process of latent on absorbing the non-parametric constituents. However, these prototypes might be persistent to confirm opinion specific technique and acquires different latent space for each distinct view. It is not that easy to improve the available Gaussian process based on the latent variable model [17]. This model was not only capable of holding a multi observational location but also give exterior information with classification labels for acquiring improved latent space prediction. However, there were some issues like over fitting, computationally expensive learning's that lead to an increase in the consumption of time; the standard classification methods require excess time for processing this vast amount of data. It also requires a massive number of records for attaining a better outcome. For data set having a large number of categorical attributes [39] besides lower samples, further formation of absolute data values might not happen among the samples of training [46]. The latent variable models are taking the responsibility where in the annotations are created by making collaborations with the desired variables that are hidden, which in turn signifies the hidden commonalities over the explanations which are generally [40] smaller in size than the dimension of authentic data. The capability of clarifying the inconsistency of data with the lesser number of components and the ability for providing steady progression on the generation, which in turn creates the models of latent variables selection of real-world for the analysis of the high dimensional dataset.
On the other hand, the integration of previous acquaintance in latent standard variables is mystified. Also, the clustering techniques which are used for disease prediction are having issues like inadequate cluster descriptors, substantial degradation of the effectiveness in high dimensional data, highly sensitive in the initialization phase, outliers, and noise. Moreover, it cannot deal with the non-convex clusters, which have variations in density and size. Hence, to overcome these issues, a novel methodology is proposed. In the proposed work, from the UCI (University of California, Irvine) Repository, the categorical data is gathered by pre-processing the data for the removal of irrelevant or missing data. The extraction 104 features are performed based on the Eigen value, Eigen vector, and data.
Moreover, from this, the best features are selected with the use of a novel ant colony optimization technique. For choosing the elements, the best features are selected. By utilizing a novel kernel classifier, the process of classification is carried out for the classification of the best features that are chosen. Here the traditional classification techniques [21] such as RF (Random Forests), GBDT (Gradient Boosting Decision Tree), CART (Classification and Regression Trees), NB-M (Multinomial Naive Bayes), NB-DCM (Naive Bayes -Dirichlet Compound Multinomial), GC-LGM (Generative classification model based on Latent Gaussian model) are used for validating the performance of the proposed approach for small datasets. Similarly, the existing clustering techniques such as K-Mean, K-Modes, Manhattan techniques are used for confirming the performance of the same for large datasets. So here our motto is to get more accuracy than the existing methods for all types of datasets. In our method enhanced classification is used for the small dataset and improvised clustering is used for the large dataset as well as the small dataset.
The main intention of this work is as follows: _ To extract the features of the categorical data, the Latent Gaussian Process (LGP) based extraction technique is utilized. _ For selecting the most relevant features from the set of extracted features, a Multi-Objective based Ant Colony Optimization (MO-ACO) technique is employed, which can improve the performance of classification. _ To classify the categorical data with better accuracy, using Canonical based SVM Kernel Function classifier is used. _ To cluster the large scale data with the use of Integration of Manhattan Frequency K-Means with Cluster Center Initialization clustering approach. _ To prove that our classification mechanism is fit for small scale data. _ To prove that our clustering mechanism is suitable for both small scale and large scale data.
The contributions of this work are as follows: Generally, in the medical field, the database may involve all the type of categorical data which may be small or extensive dataset. This dataset makes the analysis complicated for disease prediction. We aim to improvise the machine learning approach to obtain better accuracy results in disease prediction of both the large and small datasets. Among the various machine learning procedures, classification methods play a significant role and achieve better accuracy results. Thus the classification technique is used for small data sets. Even though the classification techniques provide better results but still it lacks in prediction accuracy while using large datasets. For this reason, the clustering approach is also included in this work for obtaining the results with improved accuracy for large datasets.
The remaining portion of this paper is organized as follows: Section 2 provides the literature review of the various state-of-the-art techniques employed for the processing of categorical data. Section 3 objective and motivation towards the work was explained. Section 4 details the description of the proposed mechanism. Section 5 presents the performance analysis of the proposed method. Then, this paper is concluded in the final section.

Related Works
This section provides the literature review of the techniques and processes used for the prediction of disease in the medical area i.e., extraction, selection, classification, and clustering mechanism for analysis of diseases from the categorical data.

Classification
A generative classification approach [22] was suggested for the categorical data depending on latent Gaussian process. A categorical data appears in several applications like gene sequence analysis, natural language processing, and computer-aided diagnosis. For modeling, the categorical data process the latent Gaussian efficient using the Bayesian non-parametric approach. It can estimate the density. A generative classification model for labeling the categorical data, the estimation of class-conditional densities was done by the use of the latent Gaussian process. In the case of categorical data, this method will be a suitable one than the other methods. The major draw-Information Technology and Control 2021/1/50 back of this work was that the time consumption and it takes time for computation as it reduces stability. A coupled attribute similarity learning on categorical data was described in this work [12]. Usually, the attributes were associated with each other utilizing a specific coupling relationship in the real-world data sources. However, the exploration of attribute coupling was done by introducing the co-occurrence of attribute values as they were capable of presenting the local-picture only for the analysis of categorical data. It was useful in capturing the global interactions and intrinsic among the attributes, especially in large scale categorical datasets. However, this approach failed to capture some connection degree attributes, which remains as the major drawback of this method. The semi-supervised learning method was offered in this work to imply & maximize relevance and minimize redundancy using Pearson's correlation coefficient [54]. This approach mainly concentrated on building the highly relevant feature subset. The significance of the features is estimated by the Incremental search model for computing features to coefficients and features to label coefficients, which is very simple to implement, and complexities are reduced. The proposed approach is evaluated with the binary and multi-category benchmark data sets and suitable feature subset is extracted for efficient learning mechanism. Sometimes, this approach increases the noise with labeled data, and it is tough to implement the SVM approach. An ant colony optimization was used for the mixed-variable optimization problems [35]. A new procedure is implemented for generating a mixed-variable benchmark function, artificially for tuning the parameters. This implementation in turn, increases the effectiveness and robustness of the mixed-variable optimization issues.
However, there were some limitations in this work suggested the unsupervised method to embed un-labeled categorical data keen on a continuous and low-dimensional space over and done with the Gaussian process. We can also use the Radial Basis Function with Automatic Relevance Determination as a kernel function of Gaussian processes. Categorical Latent Gaussian Process was implemented to estimate the class-conditional densities for learning hyper-parameters and subsequent likelihoods of latent continuous space. The proposed method is evaluated with splice-junction gene sequences data set and vot-ing records data set. This approach mainly overcomes the sparsity problems of categorical data. This approach is straight forward for implementation and a very flexible one. There are no particular features in Gaussian processes that are not reproducible in other methodologies. There are no predictive variances and uncertainties which are not taken into account. Angelis et al. proposed the definite sequence of data mining with the use of a hybrid clustering technique [18]. In sequential data, the identification of various dynamics has become an essential factor in the field of life sciences like bioinformatics, marketing, social sciences, and finance. In this, the sequence of categorical data was altered utilizing extended Markov model to probabilistic space. After that, with the use of hierarchical clustering, the courses were clustered. Moreover, this method had some limitations.
A new design of the Hidden Parameter Markov Decision Process (HiP-MDP), a context for modeling relations of interrelated tasks was presented using low-dimensional latent embedding's [31]. Our new framework appropriately simulates the combined improbability in the latent constraints and the state space. Also, the original Gaussian Process-based model was replaced by a Bayesian Neural Network, allowing more accessible interpretation. Regardless of huge batches, every new occurrence still needs disintegrating the insecurity around the instance-specific factors to implement it well and quickly on the task. However, this technique may fail to address the problem of complex control issues that could be overcome. An efficient data-driven similarity learning algorithm was proposed for processing the categorical data, which includes the stages of frequency-based intra coupled similarity and inter coupled similarity [53]. Here, the similarity between the absolute values was estimated based on the relationship between the attributes.
Also, the dissimilarity metrics were defined based on specific requirements. Moreover, the Coupled Attribute Similarity for Objects (CASO) and Coupled Attribute Similarity for Values (CASV) measures were utilized to estimate the frequency distribution and attribute dependency aggregation. The advantage of this work was, it improved the accuracy and reduced the complexity by using inter and intra similarity measures. This work may contain the limitation of manipulating the approaches of attributes education

Clustering
A probabilistic distance function was developed with a kernel density estimation method for clustering the categorical data [14]. In this system, the cluster scatter was defined for estimating the object to cluster distance in the categorical data. Based on the dispersion of categories, the categorical attributes were weighted, which improved the performance of clustering.
In this system, each categorical attribute was automatically assigned based on the correlation between the smoothed dispersion of the categories. Then, the number of certain clusters was estimated by defining the cluster validity index. The efficiency of this mechanism was validated by analyzing both the real world and synthetic datasets. The method of the overall kernel functions and explaining the technique on several kernels should be extended, which remains as a short coming of this work. In this work, a k-mode clustering technique was developed with a simple matching dissimilarity measure for processing the categorical objects [4]. The steps involved in this work were as follows: The cluster similarity term was computed by estimating the definite value of each attribute. The objects were partitioned by determining the membership value for all purposes in the cluster. This leads to reduced computational complexity. The cluster centers were updated for finding the modes of objects in the same group. The attribute weights were computed by analyzing the whole dataset. During the performance evaluation, various datasets such as lung cancer, soybean, dermatology, heart disease data, letter recognition data, heart disease data, and mushroom data were utilized in this work. From the results, it was analyzed that the suggested mechanism offered better efficiency and scalability in clustering the categorical data. A kernel discriminant analysis and clustering with parsimonious Gaussian process modules were proposed [10]. In this method, the data is classified into useful data, categorical data. By combing various kernels, it was possible to sort mixed data. This methodology should be extended to the semi-supervised situation, which was the shortcoming of this work. In this section, various traditional approaches that are used for categorical data are discussed. Their working procedure, along with merits and demerits, are discussed in this paper. Alexandridis et al. proposed a novel learning approach was presented for the categorical data depending upon the Radial Basis Function networks (RBF) [2]. In this approach, the numerical values were referred to as RBF centers that were replaced with the categorical tuple centers. The initial step of RBF training was comprised of comprehensive center selection, which was accomplished by introducing a fast non-iterative absolute clustering algorithm in which the weights were assessed using linear regression. The result illustrated the presented approach offered better generalization. A new distance metric for processing the categorical data was utilized in this work by using an unsupervised learning technique [29]. Also, various distance metrics have been investigated in this work, which included hamming distance, modified value difference metric, Ahmad's distance metric, association based distance metric, and content-based distance metric.
In this system, the distance between the two values was estimated by determining the costs of frequency probabilities. Here, all the similarity measures were individually treated as categorical attributes. Also, the machine learning approach was utilized to obtain a proper distance metric for the given set of objects. Moreover, the weighting scheme was employed to assign the larger weights to the attributes for providing essential information. The significant benefits of this work were as follows: it offered better adjusting capability, highlighted the infrequent items, and increased frequency. A different resounding subspace learning methods was suggested to untie the latent structures of three protuberant bilinear ways like Probit, Logit, and Tobit [44]. These were deliberated and exhausted entirely.
The determinant probability model that was being normalized through the substitute for the large scale categorical data is used. The Probit model assumes categorical data into quantized values of a positive analog-amplitude vector that resides in a direct low-dimensional subspace. Tobit is the model of high-quality censoring. The probabilistic Logit model simplifies logistic regression to the unsubstantiated instance. The rank regularization method is used for preprocessing the data. The disadvantage of this approach is reduced set representation, which provides only an approximation to the exact solution, and finding the expansions approach inevitably increases the complexity of the algorithm. In this paper, the author offered agnostic learning bounds for analyzing the risk of the Bayesian predictor [43]. It utilized the Regularized Cumulative-Loss Minimization (RCLM) for the posterior calculation against the best single predictor. The limits were implemented for various class of Bayesian models which was comprised of sparse Gaussian process (sGP), Generalized Linear Models (GLM), and Correlated Topic Models (CTM). In the case of CTM, the bound was precisely applied to the variation algorithm with distorted variation linked. For the instance of sGP and GLM, the bounds were implemented to the bounded variants of the log loss. The discrepancy amongst the loss was exposed with an alternative technique by using simple loss minimization. The empirical evaluation of CTM offered the ability to direct loss reduction. The approximation of collapsed variation has the benefits of better functional performance as well as strong theoretical guarantees. But the major limitation was that the class was restricted to the point estimates in the results. A novel algorithm that utilizes a machine learning approach for the identification of longitudinal patterns on diagnosing disease was presented in the paper [52]. There were two types of technical uniqueness considered: one such form was to enable high learning specificity by a novel learning paradigm and the other one in the way of identifying risk driving diagnosis. Also, a series of investigates which exhibit the efficiency of the offered techniques were provided, thereby revealing some novel perceptions concerning the most promising future research directions.
The reliability of the software is also finding out by using a machine learning technique [9]. Reliability is one of the major attributes of a software quality assurance system. In this paper, they have implemented the method called a recurrent neural network (RNN). In RNNs methodology output of the current task is dependent on the previous state and the current output is carried out as the input of the next computations. Here they are comparing the results, it shows the proposed RNN method will give more accuracy than the naïve bayes, decision tree and support vector machine.
The prediction of disease by using machine learning algorithm [15] over the big data from healthcare communities was carried out. In most of the dataset, there must be some incomplete data entries are found. In this proposed method they use a latent factor model to reconstruct the incomplete data and they proposed a new convolutional neural network-based multimodal disease risk prediction (CNN-MDRP). Then they are comparing their result with CNN-based unimodel disease risk prediction (CNN-UDRP) algorithm. For performance evaluation, they are using F1-measure.
Due to the popularity increase of the social networks, it is difficult in identifying between centrality for large scale network [7]. Recently, few algorithms for identifying the most influential entities in the large-scale network in real-time applications were introduced [26,32,34] followed by the MapReduce-based incremental parallel algorithm for exploring the influential nodes based on betweenness centrality in a dynamic network where edges may be dynamically updated.
Small World Model [6], was implemented for a large scale community to find out the uncovering hidden communities in the social network. The main objective of this work is to improve the efficiency of algorithms, parallel programming framework like Map-Reduce for uncovering communities in the network. Here the nodes are mapped into communities, based on the random walk in the network. Small world network exhibits three important characteristics; they are short average path length, high clustering coefficient and exhaustive search using local information. Here they also applied Hadoop framework for solving the complex problem by distributing the computation in multiple nodes in the cluster. Metrics Automatic knowledge extraction from electronic health records [51] has been reported with an extension of automatic detection technology of some chronic disease types. A novel method was stated in for the prediction of hospital admission kind was introduced as per the patient's medical history representation in the way of binary history vector [3]. The proposed technique was demonstrated with the use of real-time and massive scale dataset that was collected from the local hospitals. This technique provided a better rate of 91% of accuracy on the prediction of first future diagnosis.
We have to concentrate on feature selection part for selecting the best features; this can be helping the data mining technique to improve the percentage of the accuracy. Any data mining algorithms like classification, clustering and so on, we have to select the best feature first. Here we also made a survey on nature-based algorithms like a bee colony, firefly, genetic and ant colony and chosen the best among these algorithms. Artificial Bee Colony (ABC) algorithm [30] is a population-based stochastic optimization realizes the intelligent seeking behavior of honey bee swarms. It can be used for classification, clustering and optimization studies. ABC algorithm has three different groups. They have employed bees, onlooker bees and scout bees [38]. Here the total number of bees employed in the colony will equal the number of onlooker bees. The number of employed bees in the colony is also equated to the number of onlooker bees. The number of employed bees or onlooker bees equals the number of solutions in the population. An onlooker bee will waits in the dance area to make the food source selection decision. An onlooker bee is named employed bee once it goes to a food source. An employed bee that has consumed the food source turns into a scout bee, and its duty is to perform a random search to discover new resources. This is the complete process that happened in the ABC algorithm.
Firefly algorithm is a meta-heuristic algorithm that was inspired by the behavior of flashing lights of real fireflies [19]. The performance of this algorithm is based on the real behavior of fireflies that relies on the attraction between a firefly and another on the basis of their brightness. According to the algorithm, firefly is a unisex, attractiveness is proportional to brightness and its brightness is determined by the landscape of the fitness function.
Genetic algorithm is one of the nature-based algorithms; this is a stochastic method for function optimization based on the mechanics of natural genetics and biological evolution. The complexity of the problem was reduced by applying a meta-heuristic approach [8]. Here they are identified the related communities in large scale social networks.
Ant colony optimization algorithm is one of the best algorithms for text categorization. This algorithm is inspired to be observation on real ants in their search for the shortest paths to food sources. In Aghdam et al, [37] they have shown the best result by using the ant colony for text categorization and they also comparing their result into the genetic algorithm.
In our proposed work, we are concentrated on each and every stage by giving the novelty in all the places. From the above surveys, the large and small data set has both advantages and disadvantages. The major drawbacks are as follows: _ Generalization capabilities have to be improved. For overcoming these issues or drawbacks, our proposed system is implemented with high resolution.

Motivation and Objective
According to the World Health Organization (WHO), 56.9 million deaths occurred worldwide in 2016, among this 54% that is more than 30 million of the deaths happened by the diseases such as Ischaemic heart disease, stroke, chronic obstructive pulmonary disease, lower respiratory infections, Alzheimer disease, and other dementias, Trachea, bronchus, lung cancers, diabetes mellitus, diarrhoeal diseases, and tuberculosis [28]. The ultimate motivation behind the work is to early predict the disease accurately. At present, there are, so many ways to diagnose the disease but the accuracy will not up to the level. To eradicate this problem, the ultimate motivation towards In this paper, for disease prediction, the categorical datasets alone are considered.

Proposed Work
This section provides a detailed description of the proposed mechanism. The overall flow of the proposed system is shown in Figure (1). At first, the categorical data is collected from the repository of UCI. For removing the missing or irrelevant data, the dataset is pre-processed, after which the feature extraction technique is performed, thus to extract the features from the data, Eigen Vectors, Eigenvalues.
Furthermore, a novel multi-linear principle component analysis depending on the feature extraction al- gorithm is then executed for the extraction of features from the data. After that, the extracted features are selected to select the best features on implementing a novel multi-objective based ant colony optimization algorithm. A technique of feature selection is utilized for selecting the best feature. From this selected best feature, a Canonical based kernel classifier algorithm is designed for classifying the best-selected features. Likewise, the clustering mechanism is performed with the help of the Integration of Manhattan Frequency with K-Means Cluster Centre Initialization.
In the meantime, the trained features are stored in a knowledge database that could be made use in the testing phase. Here we can find the solution to predict the disease in both small and large scale. The datasets of more than 1000kB size are considered as large datasets, and the ones with size less than 1000kB are small datasets.For this, we can implement two different types of the algorithm, i.e., Canonical based kernel algorithm and Manhattan frequency with K-means cluster center initialization.

Pre-processing
The input is the categorical data that is the large and small scale data. The presence of noise or irrelevant data will harm the outcome. Hence, it is necessary to reduce or remove the noise, i.e., missing or unrelated data present in the input dataset. Initially, the dataset is pre-processed to minimize or eliminate the missing or irrelevant data [23]. The data normalization is also carried out in the pre-processing stage. The data pre-processing is carried out both in the training and testing phase.

Feature Extraction
In general, PCA [47] extracts the features based on linearly uncorrelated variables, not suitable for non-linear space data, and not performed by analyzing the relationship among data points. But in the Latent Gaussian Process-based multi-linear PCA(Principal Component Analysis) is a latent variable model in which the maximum likelihood solution for the hyper parameters is found through solving a non-linear kernel-based eigenvalue problem on the data's covariance matrix.
In the proposed algorithm, initially, input data is considered for pre-processing where missing values and null values are replaced. Then those pre-processed data is divided into dependent and independent variables where categorical data is converted into numerical data.
Initialize the parameter value α and β for estimating the features matrix, here α is setting as 13, which used for consideration of minimal dimension of attributes for extracting correlated features. Next, compute the mean, standard deviation, softmax function, and covariance matrix for the processed data. The log-likelihood parameter is calculated using the estimated softmax function and a covariance matrix. Categorical Gaussian features are updated based on the kernel function, which is calculated by determining the maximum log-likelihood parameter and hyper parameter. which the maximum likelihood solution for the hyper parameters is found through solving a non-linear kernel-based eigenvalue problem on the data's covariance matrix.
In the proposed algorithm, initially, input data is considered for pre-processing where missing values and null values are replaced. Then those pre-processed data is divided into dependent and independent variables where categorical data is converted into numerical data.
Initialize the parameter value α and β for estimating the features matrix, here α is setting as 13, which used for consideration of minimal dimension of attributes for extracting correlated features. Next, compute the mean, standard deviation, softmax function, and covariance matrix for the processed data. The log-likelihood parameter is calculated using the estimated softmax function and a covariance matrix. Categorical Gaussian features are updated based on the kernel function, which is calculated by determining the maximum loglikelihood parameter and hyper parameter. Step 1: Let Dt be the input dataset Step 2: By applying preprocessing techniques carry out missing values replacement and null value replacement Step 3: Let the preprocessed dataset be Step 5: After the separation of dependent and independent variables, the categorical data to numerical data conversion is done.

Cent= Cov
Step 19: Now of the centra features Step 20: The

Procedure:
Step 1: Let Dt be the input dataset Step 2: By applying preprocessing techniques carry out missing values replacement and null value replacement Step 3: Let the preprocessed dataset be Dt P Step 4: Now divide the Dt P into dependent (Dt Y ) and independent (Dt X ) variables Step 5: After the separation of dependent and independent variables, the categorical data to numerical data conversion is done.
Step 6: Initializing α=13 and β=15 Step 7: for I=1 to size (Dt X ) Step 8: Compute the standard deviation (σ) of h the maximum likelihood solution for the hyper meters is found through solving a non-linear el-based eigenvalue problem on the data's riance matrix. he proposed algorithm, initially, input data is idered for pre-processing where missing values null values are replaced. Then those pre-processed is divided into dependent and independent bles where categorical data is converted into Step 9: Compute the mean (μ) of Step 10: Generate an identity matrix of σ, Step 11: compute the softmax function by using the following equation Step 9: Compute the mean (μ) of Step 10: Generate an identity matrix of σ, Step 11: compute the softmax function by using the following equation Step 12: Generate the covariance matrix

) ( x Cv
Step 13: Then the log-likelihood parameter is computed by using the below equations. Let x and y represent the size of

Cv mean
Step 15: then the hyperparameter Step 16: Now compute the features using the generated parameters using Step 17: compute the categorical Gaussian features by the equation Step 18: calculate the central components of the categorical features using, Step 19: Now extract the Eigenvalues and Eigenvectors of the central components to generate the multi-linear features Step 20: The extracted features are represented by, This technique is an effective method to obtain the best features present in the categorical input data. Gaussian Process (GP) benefits are as follows: it can directly capture the model improbability, once consuming GP, it is capable of improving preceding information and terms about the outline of the model by choosing altered kernel functions.
PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the Step 10: Generate an identity matrix of σ, Step 11: compute the softmax function by using the following equation Step 9: Compute the mean (μ) of Step 10: Generate an identity matrix of σ, Step 11: compute the softmax function by using the following equation Step 12: Generate the covariance matrix Step 13: Then the log-likelihood parameter is computed by using the below equations. Let x and y represent the size of

Cv mean
Step 15: then the hyperparameter Step 16: Now compute the features using the generated parameters using Step 17: compute the categorical Gaussian features by the equation Step 18: calculate the central components of the categorical features using, Step 19: Now extract the Eigenvalues and Eigenvectors of the central components to generate the multi-linear features Step 20: The extracted features are represented by, This technique is an effective method to obtain the best features present in the categorical input data. Gaussian Process (GP) benefits are as follows: it can directly capture the model improbability, once consuming GP, it is capable of improving preceding information and terms about the outline of the model by choosing altered kernel functions.
PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the Step 12: Generate the covariance matrix Cv x Step 13: Then the log-likelihood parameter is computed by using the below equations. Let x and y represent the size of Cv x . Then let z be the size of the Cf x vector The generative model is represented by, Step 9: Compute the mean (μ) of X Dt , Step 10: Generate an identity matrix of σ, Step 11: compute the softmax function by using the following equation Step 12: Generate the covariance matrix Step 13: Then the log-likelihood parameter is computed by using the below equations. Let x and y represent the size of

Cv mean
Step 15: then the hyperparameter Step 16: Now compute the features using the generated parameters using Step 17: compute the categorical Gaussian features by the equation Step 18: calculate the central components of the categorical features using, Step 19: Now extract the Eigenvalues and Eigenvectors of the central components to generate the multi-linear features Step 20: The extracted features are represented by, This technique is an effective method to obtain the best features present in the categorical input data. Gaussian Process (GP) benefits are as follows: it can directly capture the model improbability, once consuming GP, it is capable of improving preceding information and terms about the outline of the model by choosing altered kernel functions.
PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the Step 9: Compute the mean (μ) of Step 10: Generate an identity matrix of σ, Step 11: compute the softmax function by using the following equation Step 12: Generate the covariance matrix Step 13: Then the log-likelihood parameter is computed by using the below equations. Let x and y represent the size of

Cv mean
Step 15: then the hyperparameter Step 16: Now compute the features using the generated parameters using Step 17: compute the categorical Gaussian features by the equation Step 18: calculate the central components of the categorical features using, Step 19: Now extract the Eigenvalues and Eigenvectors of the central components to generate the multi-linear features Step 20: The extracted features are represented by, This technique is an effective method to obtain the best features present in the categorical input data. Gaussian Process (GP) benefits are as follows: it can directly capture the model improbability, once consuming GP, it is capable of improving preceding information and terms about the outline of the model by choosing altered kernel functions.
PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the  PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the Step 9: Compute the mean (μ) of  Step 10: Generate an identity matrix of σ,   PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the Step 18: calculate the central components of the categorical features using, Cent= Covariance matrix (avg (Cg f ) -μ) Step 19: Now extract the Eigenvalues and Eigenvectors of the central components to generate the multi-linear features  Step 12: Generate the covariance matrix

) ( x Cv
Step 13: Then the log-likelihood parameter is computed by using the below equations. Let x and y represent the size of PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the This technique is an effective method to obtain the best features present in the categorical input data. Gaussian Process (GP) benefits are as follows: it can directly capture the model improbability, once consuming GP, it is capable of improving preceding information and terms about the outline of the model by choosing altered kernel functions.
PCA approach has the following benefits like reduced complexity in images, combined with the usage of PCA. Smaller database depiction as only the trainee images are kept in the system of their estimates on a condensed basis.
The decline of noise as the maximum variation source is selected, and thus the small differences in the background are disregarded automatically.
The list of features is extracted to check and filter out the best features in the next stage with the use of the feature selection mechanism.
An algorithm for multi-linear PCA with the categorical Latent Gaussian Process-based feature Extraction is shown. The input is the dataset. By applying the pre-processing technique, the missing or irrelevant data are removed and then divided as dependent and independent variables. Followed by this, categorical data are converted into numerical data. The standard deviations are estimated by initializing the alpha and beta values. After that mean, identity matrix, softmax function, covariance is generated. From the hyperparameters, the generation of features is made. From this produced feature, the computation of categorical Gaussian features is performed for which the eigenvalues, eigenvectors of the central components are generated to extract multi-linear features.

Feature Selection
Ant colony optimization algorithm (ACO) [36] generally used for the selection of optimal features based on the objective function; here, ACO is applied for solving the multi-objective problem.
Initialize the parameters of the ACO by setting the weight, minimum, and maximum velocity. Initialize the ant population as the extracted features, from which estimate the pheromone and speed of the communities. Update the local pheromone and velocity by comparing the features matrix. Estimate the fitness value based on the features matrix and computed pheromones. The above fitness value is used for predicting the relevant features. With the estimated fitness value is used for updating the global pheromone index by deriving the global objective function. Obtained global pheromone is processed for suppressing the redundant features. Finally, we generated the best feature matrix. From the extracted features, the best optimal features are selected with the use of a novel multi-objective based Ant-colony optimization (ACO) mechanism. Step 5: After the separation of dependent and independent variables, the categorical data to numerical data conversion is done.
Step 6: Initializing α=13 and β=15 Step 7: for I=1 to size ) ( X Dt Step 8: Compute the standard deviation (σ) of Step 1: Initially set the weight, minimum, and maximum velocity Step 2: As a next step, the input features are initialized into total features into a set of ant populations and also initialize the pheromone and velocity.
Step 3: Start the iteration by selecting an ant/feature, for each iteration, calculate the pheromone and velocity for cost computation by using the below equations and the velocity is given byn Where i and j represent the size of the features/ the number of the ants.  Step 1: Initially set the weight, minimum, and maximum velocity Step 2: As a next step, the input features are initialized into total features into a set of ant populations and also initialize the pheromone and velocity.
Step 3: Start the iteration by selecting an ant/feature, for each iteration, calculate the pheromone and velocity for cost computation by using the below equations  Step 5: Now the fitness value computation is done by using the objective function as, Step 6: From the best fitness values, best path value is selected by using, the equation below, Step 7: Now update global pheromone and the path using the below functions, Setting 0 P = ω Step 8: Update the global best pheromone by using and also update fitness values.
Step 9: Compute the best solution by choosing the best path as, Step 10: Get the index of the best-selected features and generate the best feature matrix. This ACO approach has the following advantages as it can search among the population in parallel, gave rapid discovery of the proper solutions, can quickly adapt to the changes like new distances, and has a guaranteed coverage. This technique is effective for a selection of features where the selected best features are stored in a knowledge database (KB). In the testing phase, the best test features are selected, depending on the best indexes in KB. Also, the best selected features are kept in the Step 5: Now the fitness value computation is done by using the objective function as, Step 1: Initially set the weight, minimum, and maximum velocity Step 2: As a next step, the input features are initialized into total features into a set of ant populations and also initialize the pheromone and velocity.
Step 3: Start the iteration by selecting an ant/feature, for each iteration, calculate the pheromone and velocity for cost computation by using the below equations  Step 5: Now the fitness value computation is done by using the objective function as, Step 6: From the best fitness values, best path value is selected by using, the equation below, Step 7: Now update global pheromone and the path using the below functions, Setting 0 P = ω Step 8: Update the global best pheromone by using and also update fitness values.
Step 9: Compute the best solution by choosing the best path as, )) ( ( best best best G mean G Feat < = Step 10: Get the index of the best-selected features and generate the best feature matrix. This ACO approach has the following advantages as it can search among the population in parallel, gave rapid discovery of the proper solutions, can quickly adapt to the changes like new distances, and has a guaranteed coverage. This technique is effective for a selection of features where the selected best features are stored in a knowledge database (KB). In the testing phase, the best test features are selected, depending on the best indexes in KB. Also, the best selected features are kept in the Step 1: Initially set the weight, minimum, and maximum velocity Step 2: As a next step, the input features are initialized into total features into a set of ant populations and also initialize the pheromone and velocity.
Step 3: Start the iteration by selecting an ant/feature, for each iteration, calculate the pheromone and velocity for cost computation by using the below equations  Step 5: Now the fitness value computation is done by using the objective function as, Step 6: From the best fitness values, best path value is selected by using, the equation below, Step 7: Now update global pheromone and the path using the below functions, Setting 0 P = ω Step 8: Update the global best pheromone by using and also update fitness values.
Step 9: Compute the best solution by choosing the best path as, )) ( ( best best best G mean G Feat < = Step 10: Get the index of the best-selected features and generate the best feature matrix. This ACO approach has the following advantages as it can search among the population in parallel, gave rapid discovery of the proper solutions, can quickly adapt to the changes like new distances, and has a guaranteed coverage. This technique is effective for a selection of features where the selected best features are stored in a knowledge database (KB). In the testing phase, the best test features are selected, depending on the best indexes in KB. Also, the best selected features are kept in the for all i & j Step 7: Now update global pheromone and the path using the below functions, Setting 0 P = ω Step 8: Update the global best pheromone by using Step 1: Initially set the weight, minimum, and maximum velocity Step 2: As a next step, the input features are initialized into total features into a set of ant populations and also initialize the pheromone and velocity.
Step 3: Start the iteration by selecting an ant/feature, for each iteration, calculate the pheromone and velocity for cost computation by using the below equations  Step 5: Now the fitness value computation is done by using the objective function as, Step 6: From the best fitness values, best path value is selected by using, the equation below, Step 7: Now update global pheromone and the path using the below functions, Setting Step 8: Update the global best pheromone by using and also update fitness values.
Step 9: Compute the best solution by choosing the best path as, )) ( ( best best best G mean G Feat < = Step 10: Get the index of the best-selected features and generate the best feature matrix. This ACO approach has the following advantages as it can search among the population in parallel, gave rapid discovery of the proper solutions, can quickly adapt to the changes like new distances, and has a guaranteed coverage. This technique is effective for a selection of features where the selected best features are stored in a knowledge database (KB). In the testing phase, the best test features are selected, depending on the best indexes in KB. Also, the best selected features are kept in the and also update fitness values.
Step 9: Compute the best solution by choosing the best path as, ) ) ( ( best best best G mean G Feat < = Step 10: Get the index of the best-selected features and generate the best feature matrix. This ACO approach has the following advantages as it can search among the population in parallel, gave rapid discovery of the proper solutions, can quickly adapt to the changes like new distances, and has a guaranteed coverage. This technique is effective for a selection of features where the selected best features are stored in a knowledge database (KB). In the testing phase, the best test features are selected, depending on the best indexes in KB. Also, the best selected features are kept in the machine that is trained. After the selection of the best features, the clustering mechanism is performed for the large scale data, and the classification technique is carried out for smaller-scale data.
A multi-objective ACO algorithm for the selection of the best feature is shown in which the input is an extracted feature. At first, the weight, maximum-minimum velocities are set, after which the input features are initialized into total features as a set of the ant population, thereby initializing the pheromone and velocity. Iterations start with the selection of elements or ant by the cost computation of velocity and pheromone. The values of local pheromone and velocity are updated. The fitness function values are computed from the objective function from which the path value is selected. The updating of global pheromone and the path is updated on computing the best solution for choosing the way that is best. The indexes of the best-selected features are attained by the generation of the best feature matrix. Finally, the best-selected features are obtained as an output.

Classification Mechanism
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane [25]. SVM usually performed with kernel function, which is used to solve any complex problem and provides strength during the training process, which improves the accuracy of the classifier. The selection of kernel function for hyper parameters is an essential task. In general, the Radial Basis Function (RBF) is used as a kernel function that uses the same weight parameters for every attribute and provided less accuracy. Here canonical based Kernel Function is used for updating the distributed weights. Initially, training sets and testing sets are separated from the optimized features. Designed the custom kernel function for updating the hyper parameters using Jordon Canonical based Kernel Function also initialized the latent dimension and mesh matrix. The settings of variational and hyper parameters for all iteration are updated. Finally, the predicted results are being generated and are compared with the ground truth. Here we can apply the small dataset, and then the results are compared with the existing methods. In the case of small scale data, the classifications of the best-selected features are classified by utilizing a Canonical based SVM Kernel function -based classification algorithm. SVM algorithms use a set of mathematical functions that are defined as the kernel. SVM has the following advantages: functions well with even formless and semi-structured documents like text, Images, and trees, the kernel pretend are the actual strong point of SVM. By the suitable kernel function, any problematic could be solved, unlike in neural networks, SVM is not resolved for local optima. This measures the high dimensional data quietly well, SVM models have an overview in practice, and the risk of over fitting is a smaller amount in SVM. The function of the kernel is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions such as linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. The kernel functions return the inner product between two points in a suitable feature space.

Algorithm 3: Canonical based SVM Kernel Function-based Classification Algorithm
Input: Best selected features ) ( best Feat , classifier Labels ) ( lbls C Output: classified output ) ( Ops C Procedure: Step 1: Let testing size =30% and training size be =70% Step 2: From the selected features, separate the features into a training set ) ( feat Tγ and testing set Step 3: Instead of radial basis function the custom kernel classifier is designed in such a way that hyperparameters for the classifier are pre-set in the feature extraction module, and the latent dimension are set, and also the size of the mesh matrix is initialized to 0.02 Step 4: The custom kernel is designed based on canonical form to utilize the function, ne that is trained. After the selection of the best es, the clustering mechanism is performed for the scale data, and the classification technique is d out for smaller-scale data. lti-objective ACO algorithm for the selection of the ature is shown in which the input is an extracted e. At first, the weight, maximum-minimum ties are set, after which the input features are ized into total features as a set of the ant ation, thereby initializing the pheromone and ty. Iterations start with the selection of elements t by the cost computation of velocity and mone. The values of local pheromone and velocity dated. The fitness function values are computed he objective function from which the path value is d. The updating of global pheromone and the is updated on computing the best solution for ing the way that is best. The indexes of the bestd features are attained by the generation of the eature matrix. Finally, the best-selected features tained as an output. lassification Mechanism: port Vector Machine (SVM) is a discriminative ier formally defined by a separating hyperplane SVM usually performed with kernel function, is used to solve any complex problem and es strength during the training process, which ves the accuracy of the classifier. The selection of l function for hyper parameters is an essential n general, the Radial Basis Function (RBF) is used kernel function that uses the same weight eters for every attribute and provided less cy. Here canonical based Kernel Function is used dating the distributed weights. lly, training sets and testing sets are separated the optimized features. Designed the custom l function for updating the hyper parameters Jordon Canonical based Kernel Function also ized the latent dimension and mesh matrix. The s of variational and hyper parameters for all on are updated. Finally, the predicted results are generated and are compared with the ground Here we can apply the small dataset, and then the s are compared with the existing methods. case of small scale data, the classifications of the elected features are classified by utilizing a ical based SVM Kernel function -based ication algorithm. SVM algorithms use a set of matical functions that are defined as the kernel. has the following advantages: functions well with even formless and semi-structured documents like text, Images, and trees, the kernel pretend are the actual strong point of SVM. By the suitable kernel function, any problematic could be solved, unlike in neural networks, SVM is not resolved for local optima. This measures the high dimensional data quietly well, SVM models have an overview in practice, and the risk of over fitting is a smaller amount in SVM. The function of the kernel is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions such as linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. The kernel functions return the inner product between two points in a suitable feature space.

) ( feat Ts
Step 3:Instead of radial basis function the custom kernel classifier is designed in such a way that hyperparameters for the classifier are pre-set in the feature extraction module, and the latent dimension are set, and also the size of the mesh matrix is initialized to 0.02 Step 4: The custom kernel is designed based on canonical form to utilize the function, Step 5: Update the variational parameters and hyperparameters for each iteration Step 6: The predicted results ) ( ops C are generated and compared with the ground truth ) ( lbls C Also, the clustered data are classified, which are then trained in a trained machine to yield a classification output finally. Thus, the extracted features are classified effectively, which is then used to predict the disease in a medical area. An Enhanced Kernel classifier Algorithm is presented in which input is the best-selected features and classifier labels. The testing size of 30  even formless and semi-structured documents like text, Images, and trees, the kernel pretend are the actual strong point of SVM. By the suitable kernel function, any problematic could be solved, unlike in neural networks, SVM is not resolved for local optima. This measures the high dimensional data quietly well, SVM models have an overview in practice, and the risk of over fitting is a smaller amount in SVM. The function of the kernel is to take data as input and transform it into the required form. Different SVM algorithms use different types of kernel functions such as linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. The kernel functions return the inner product between two points in a suitable feature space. Step 3:Instead of radial basis function the custom kernel classifier is designed in such a way that hyperparameters for the classifier are pre-set in the feature extraction module, and the latent dimension are set, and also the size of the mesh matrix is initialized to 0.02 Step 4: The custom kernel is designed based on canonical form to utilize the function, Step 5: Update the variational parameters and hyperparameters for each iteration Step 6: The predicted results ) ( ops C are generated and compared with the ground truth ) ( lbls C Also, the clustered data are classified, which are then trained in a trained machine to yield a classification output finally. Thus, the extracted features are classified effectively, which is then used to predict the disease in a medical area. An Enhanced Kernel classifier Algorithm is presented in which input is the best-selected features and classifier labels. The testing size of 30 percent and training size of 70 percent is considered from which the selected // Jordan canonical expression Step 5: Update the variational parameters and hyper-parameters for each iteration Step 6: The predicted results ) ( ops C are generated and compared with the ground truth ) ( lbls C Also, the clustered data are classified, which are then trained in a trained machine to yield a classification output finally. Thus, the extracted features are classi-fied effectively, which is then used to predict the disease in a medical area.
An Enhanced Kernel classifier Algorithm is presented in which input is the best-selected features and classifier labels. The testing size of 30 percent and training size of 70 percent is considered from which the selected elements are separated as training and testing set. Radial basis function enabling the design of a custom kernel classifier was made in a manner where the hyper parameters of the classifier are preset in the module of feature extraction. The size of the matrix mesh and the latent dimensions are initialized as 0.02. The parameters of variational and hyperparameters of every iteration are updated.
Finally, the predicted results are being generated and are compared with the ground truth. Here we can apply the small dataset, and then the results are compared with the existing methods. After that, we know that it was well suited for the small dataset and not for a large dataset, so we introduced the enhanced Manhattan frequency with K-means cluster center initialization for focusing the large dataset. This method provides the best results for the small dataset, and it was compared with the existing processes, but it's unfit for a broad set.

Clustering Technique
In general, cluster centroids are considered as randomly based on the ranges of given sample values. It leads to clustering the data points randomly, and it may provide misclassification and more deviation in similarity among classes.
Here, more efficient and scalable operations are being proposed towards the categorical clustering of the classified outcomes from the classifier. Besides, we incorporate the cluster center initialization in our method. For the large scale data, Integration of Manhattan Frequency K-Means [42] with Cluster Centre Initialization is carried out to cluster the vast amount of data and to classify the clusters concerning the best features. K-means approach has the following advantages: If variables are massive, then K-Means performs computationally quicker than hierarchical clustering if k is kept smalls, K-Means create tighter clusters than hierarchical clustering, mainly if the clusters are round.
The Manhattan Frequency k-Means (MFk-M) is a partitioned categorical clustering method based on transforming the categorical data into numeric measures using the relative frequencies of the modalities in the attributes. This fact permits directly using traditional numeric clustering methods. Step 1: Initially the number of clusters are set as K Step 2: Then next step refers to the selection of initial cluster centers, it's done by the below steps, Step 2a: Let D xd be the set of data elements with the best-selected attributes, where n represents the number of best-selected attributes Step 2b: For each best selected characteristics, compute the Mean Step 4: Apply clustering in sel best characteristics and update the cluster labels and the initial cluster centers.
Step 5: Update distance by using hamming distance, ) ( Step 6: Update the points of the cluster based on the data points Step 7: For I in K, 1 C = cluster (data point in I) Step 8: Compute mean scores in the group and the update the cluster centers Step 9: Based on the cluster member, the cluster data points are extracted, and the clustered results are obtained.
At first, the numbers of clusters are set as K; the selection of initial cluster centers is referred to in the next step. In the set of data elements, the selected best attributes are there and n is the number of characteristics. The mean, standard deviation of the best features are computed for the best-selected attributes along with the evaluation of centile values for each attribute. The initial partition among the data elements and attribute values are then created. On applying the clustering technique, the cluster labels are updated with the initial cluster centers. Using a Hamming distance, the distance is updated. After the updating of cluster points, the mean scores in the cluster are computed. The cluster data points are extracted depending on the cluster member by extracting the cluster output finally.

Performance Analysis
This section provides the performance analysis of the proposed mechanism on the medical dataset with categorical attributes. For both small and large datasets, 30% testing and 70% training are taken.

SPECT Heart
The dataset defines and analyzes the cardiac Single Proton Emission Computed Tomography (SPECT) [49] images. Each of these patients is categorized into two groups: abnormal and normal. This dataset has two classes, wherein class 1 contains 15 samples, and class 2 contains 172 samples. The database of 267 SPECT image sets (patients) was used to extract features that condense the SPECT original images. Consequently, 44 continuous feature patterns were generated for the respective patient. The pattern was processed further to attain 22 binary feature patterns, and the size of this dataset is 9 kB.

Breast Cancer
This is one of the three fields delivered by the Institute of Oncology [11] that has frequently seemed in the literature of machine learning. (Lymph graphic and primary-tumor). This data set comprises 201 cases of one class and 85 illustrations of a different class. The instances are labeled by 9 attributes, nearly of which are linear, and some are insignificant. The size of the breast cancer dataset is 19 kB.

Gene Sequences
Samples (instances) are kept row-wise. Variables (attributes) of each model are RNA-Seq gene expression [24] stages restrained by the platform illumine HiSeq. Splice junctions, which are also referred to as Gene Sequences, are DNA sequence points at which `superfluous' DNA is detached in the practice of protein formation in developed entities. The difficult modeled in this dataset is to identify a given categorization of DNA, the boundaries among introns (the parts of the DNA sequence that are spliced out), and exons (the parts of the DNA sequence retained after splicing). The total size of this dataset is 385 kB. There are three different classes: 766 samples in class 1, 768 samples in class 2, and 1655 samples in class 3. It is comprised of 3190 data records and 60 feature patterns.

EEG Eye State
All data is from one constant EEG [20]

Epileptic Seizure Dataset
The original dataset [5] from the location contains five altered folders, all with 100 files, by every folder on behalf of a distinct issue/person. Every file is a record of brain action for 23.6 seconds. There are five different classes, each having 2300 samples. The consistent time-series is tested into 4097 data points. Every data point is the rate of the EEG record at an altered point in time. Also, there are 178 features present in this dataset, and the size is 7329 KB (7.3 MB). Therefore we must total 500 individuals by each has 4097 data points for 23.5 seconds.

Precision
Precision is well-defined as a quantity that is used to estimate the concert of the classification method.

Recall
Recall processes ability of the estimated model to choose the illustration of an assured session as of a dataset. It is also named as sensitivity that is deliberated as Re / ( ). call TP TP FP = + (2)

F1-score
In the numerical analysis of binary segmentation, the F1 score is used to measure and test accuracy. The F1 score is constructed by the weighted average of precision and recall. The F1 measure reaches its best value at one or near one. The worst value is represented as zero or near to zero. F1 score is used to measure the test accuracy, and it includes both the precision and recall values. F1 score reaches the best value as one, and the worst value is zero.

Accuracy
Accuracy denotes the closeness of a restrained value to a standard or known value. Accuracy is also stated to weighted arithmetic mean of Precision, and Inverse Precision besides weighted arithmetic mean of Recall and Inverse Recall .

Performance and Comparative Analysis of Small Dataset (Classification)
The performance analysis and comparative analysis of the small dataset classification [13] process is shown in the following subsecions.

Breast Cancer Dataset
The Table below shows the breast cancer dataset accuracy rate percentage for proposed and existing methods.

Spect Heart
The Table 3 below shows the SPECT heart dataset accuracy rate percentage for proposed and existing methods.
In Table 3, the SPECT heart dataset is taken, and the classification process is carried out for this small dataset. Various techniques are compared to the rate of accuracy. On comparing other traditional techniques, the proposed method shows a better test accuracy rate In this, the mean value will be taken from ten times occurrence. These are the small datasets which are having less number of records in it. For this, we can go with our classification method itself.
Based on the results provided in Table 4, the precision and recall values are high for the proposed method while comparing with the existing ones.

Gene Sequences
In Table 5, the gene sequence dataset was used for disease prediction and reported with accuracy rate for proposed and other existing methods.   table that the precision and  recall values are high existing methods like RF, GBDT,  CART, NB-M, NB-DCM, GC-LGM, Fengmao et al.
From this analysis, the proposed method shows a better result than the traditional techniques. In addition to the performance metrics such as percentage of accuracy, precision and recall, to validate the effectiveness of our proposed method, for the Breast Cancer, SPECT and Gene Sequence datasets, the F1 score values were also computed as detailed in Table 7.

Performance and Comparative Analysis of Large Dataset (Clustering)
The performance analysis and comparative analysis of the large dataset clustering [41] process is shown in the above table shows. As reported in Table 8, the clustering technique of a large scale dataset provides a better result for the proposed method than the existing methods. The epileptic seizure [16] and EEG eye state [45] datasets are considered here as large datasets which provides an accuracy rate of about 0.828 and 0.9506. Similarly, the NMI rate of epileptic seizure and EEG eye state are 0.9739 and 0.9777. Likewise, the execution time is 0.015 and 0.235 for epileptic seizures and EEG eye state datasets.  Table 9 presents the performance analysis of the large scale dataset for the epileptic seizure prediction. The conventional estimates in terms of rand score, V-measure score, silhouette score, and CH score, were compared with our proposed clustering technique for analyzing their better performance. Based on the comparative analysis, it is found that our proposed approach is capable of providing better results for all the estimates and outperforms the existing methods.
In Table 10, the proposed clustering technique for handling a large scale dataset has been reported with better results than other similar techniques. The epileptic seizure and EEG eye state are taken, which provides an accuracy rate of about 0.828 and 0.9506. Similarly, the NMI rate of epileptic seizure and EEG eye state are 0.9739 and 0.9777. Likewise, the execution time is 0.015 and 0.235 for epileptic seizure and EEG eye state. Also, the proposed clustering technique is tested for the small dataset, as per Table 11; it shows that our proposed method yields better outcomes for small datasets also. Despite the fact, the classification techniques provide better results; still, it lacks in prediction accuracy while using large datasets.
Finally, it is observed that our proposed clustering method is well suited for both small and extensive dataset for the correct prediction of diseases in the medical field.
The sensitivity and specificity values of small datasets are listed in Table 12. Here sensitivity is the percentage of sick people correctly identified as having the condition. Likewise specificity is the percentages of healthy people are identified as not having the condition.  Here we also extended the analysis part by made the cross-validation of all the given dataset by varying the percentage of training and testing data in the difference of 10%, from 10% to 90%. The result of this analysis is mentioned in Table 13 and it is displaying the best value for all the datasets.
Based on the comparative analysis of the proposed method with existing state-of-art approaches, it has been proven that our proposed method obtained more classification accuracy than other methods. Table 13 Cross-validation of accuracy rate in small datasets

Conclusion
In the present time, the clustering of categorical data was a challenging aspect in the medical field, which in turn handles the massive number of data. The prediction of disease in an early stage is of most importance for the quick treatment and curing them. In a large scale categorical data, the clustering technique was performed effectively in this proposed mechanism. Initially, small and large scale categorical data are given as input, which was pre-processed; feature extracted by using some novel multi-linear PCA based feature extraction technique. Then the selection of features is made by using a novel multi-objective based ant colony optimization approach. In the case of small scale data, the selected features are classified using canonical based SVM kernel classifier. Also, in large scale From the study, it was evident that our classification method is suited for small scale datasets, and the clusters method was suitable for large scale datasets and also small scale dataset. From this, it was evident that the proposed mechanism would provide the results more accurate to predict the disease easily.
Then the result of proposed method is compared by using the performance metrics like precision, NMI, execution time, recall, and accuracy, and it is proved to be an out performing one.
Even this new method requires further improvement in implementing a unique iterative relocation based partitioning for clustering the big dataset in the future. The research will also extended by testing the numerical datasets. Also we have to do the analysis by changing any other nature-based algorithms instead of ACO algorithm. Here the main intension of the future work is to enhance this methodology to make adoptable for all kind of datasets.