Small Sample Time Series Classification Based on Data Augmentation and Semi-supervised Learning

Realistic scenarios produce labeled data and unlabeled data, however, there are significant challenges in labeling time series data. It is imperative to effectively integrate the relationship between labeled and unlabeled data within semi-supervised classification model. This paper presents a novel semi-supervised classification method, namely Data Augmentation-Fast Shapelet Semi-Supervised Classification, which employs a data augmentation module to enhance the diversity of data and improve the generalization ability of the model, as well as a feature fusion module to enhance the semi-supervised network. A conditional generative adversarial network is used to synthesize excellent labeled time series samples to enhance the homogeneous data in the sample space, the fast shapelets method is used to quickly extract the important shape feature vectors in the time series, self-supervised and supervised learning are combined to fully learn the unlabeled and labeled data of the time series dataset. The joint loss function combines the loss functions of the two networks to optimize multiple objectives. Reinforcement learning is used to determine the weight coefficients of the joint loss function, at the same time, the reward function is modified to bias the supervisory loss, which improves the classification performance of the model under limited labeled data, and the model can also better achieve the semi-supervised classification task. The proposed method is validated on the UCR benchmark dataset, Electrocardiogram dataset, and Electroenceph-alogram dataset, the results show that the semi-supervised classification method can perform a more accurate semi-supervised classification of the time series, with an accuracy better than the comparison methods. Mean-while, we use the plant electrical signal dataset obtained from actual measurements for testing, the visualization analysis can clearly show the model role in the semi-supervised classification task, and the experimental results fully demonstrate the effectiveness and applicability of the proposed method.


Introduction
The reality is that time series data exists in several industries, such as in the medical industry with electrocardiographic (ECG) data [30] and electroencephalogram (EEG) data [3].In the agricultural and biological industries, plant electrical signals data are observed [20], and Internet of Things data are studied [32].In the industrial sector, there are bearing signal data [26], as well as aerospace signal data [27].Unauthorized broadcast data identification [48], radio data classification [46,47] in the communications industry.However, these time series data share common drawbacks: industry specificity limits the availability of large amounts of data; the process of collecting, organizing, and manually labeling large amounts of time series training data is time-consuming and costly, and thus the time series dataset consists of a large amount of unlabeled data with a small amount of labeled data.Classifiers constructed using a small amount of labeled data with a large amount of unlabeled data are called semi-supervised classification models, and using a small amount of labeled time series data for semi-supervised classification tasks is a great challenge in machine learning research [24].
In recent years, deep networks have achieved satisfactory results in semi-supervised classification of time series data [44,45].In particular, deep networks have achieved satisfactory results in semi-supervised classification of time series data [36].However, semi-supervised classification models for time series also have two obvious shortcomings: first, it is well known that deep neural networks require a large amount of training data, but with the current small amount of time series data and the lack of labeled data, deep networks are prone to overfitting and poor robustness [21].Spline interpolation and Piecewise Cubic Hermite Interpolating Polynomial interpolation methods, Empirical Mode Decomposition are common time series data enhancement techniques [1,2].Inadequately trained semi-supervised classification models are unable to correctly represent the distribution of time series data, which leads to low classification accuracy of semi-supervised models [13].Second, how semi-supervised classification models can effectively integrate labeled data with unlabeled data to improve model performance [5].
The semi-supervised classification task for small-sample time-series data mainly stems from the high cost of data acquisition in practical applications, the lack of data labels, the pursuit of model generalization ability, Currently, researchers have proposed some semi-supervised classification models for time series data classification.The semi-supervised time series feature learning model proposed by Wang, et al. incorporates labeled and unlabeled time series data into an integrated model that efficiently learns through least square minimization, spectral analysis, scaled pseudo labels, and feature similarity regularization terms [35].Jawed et al. provide a powerful alternative to supervised signals for feature learning by utilizing unlabeled training data through a prediction task, optimizing the multitasking learning approach and model prediction as a secondary task along with the primary task of classification, with a model that has better performance [14].Xi et al. proposed that the past-anchor-future strategy can extract higher-quality semantic context from unlabeled time series data, and that self-supervised temporal relation learning can effectively assist supervised models [38].Rezaei et al. pre-trained the model on a large unlabeled dataset by inputting the time series features of the sampled packets, and then the learned weights were transferred to a small labeled dataset, which has the same accuracy as a fully supervised method with a large labeled dataset [31].Goschenhofer et al. show significant performance gains in deep semi-supervised learning models by discussing the transferability of state-of-the-art deep semi-supervised models from image to time-series classification, combined with the use of appropriate model backbone architectures and customized data enhancement strategies [11].Xi et al. used the lower bound of DTW, LB_Keogh to construct pairwise distance matrices and construct a graph neural network, which is a new graph construction module, and the experimental results showed that it accelerated the network training without decreasing the classification accuracy [39].Fan et al. proposed a simple and effective semi-supervised time series classification architecture approach, for labeled time series, SemiTime performs supervised classification directly under the supervision of annotated class labels; for unlabeled time series, segments of past-future pairs were sampled from the time series, SemiTime predicts the temporal relationships between these segments in a self-supervised manner, and experimental results showed that SemiTime outperforms state-of-the-art techniques [9].Wei et al. proposed a multi-task learning scheme for semi-supervised time series classification (MTFC) with time-frequency mining.Unsupervised tasks are used to capture the time-frequency information of the time series, and the multi-task learning framework is used to learn the common features of the labeled and unlabeled data, and the model can effectively improve the performance of semi-supervised classification [36].Eldele et al. proposed a novel framework for learning time series representations, which learns representations from unlabeled data through contrast learning, using their views to learn robust temporal relations in the proposed temporal contrast module through weak and strong augmentation specific to the time series, in addition to learning discriminative representations through the contextual contrast module, which has shown to be highly efficient in a few labeled data and in migration learning scenarios [8].
Meanwhile, semi-supervised classification of time series has been applied in practical time series classification scenarios.Wu et al. proposed a semi-supervised fault diagnosis model with an unsupervised autoencoder modified using mean square error, employing labels of the data and utilizing a softmax classifier to directly diagnose the health condition based on the coded features of the autoencoder, which was validated on the electric motor bearings dataset and the industrial hydraulic turbine dataset, and the results showed that the method obtains high diagnostic accuracy [37].Liu et al. present a framework dedicated to classification (AMC) radio modulation, which achieves higher performance with less labeled data by carefully utilizing unlabeled signal data and a self-supervised comparative learning pre-training step [22].Han et al. proposed an end-to-end semi-supervised learning framework with two deep neural networks with different backbones and achieved high classification accuracies in a motion picture EEG dataset using contrast learning and adversarial training strategies end-to-end semi-supervised learning framework [12].Semi-supervised learning allows labeled and unlabeled data to be used efficiently and improves the efficiency of data usage, but the algorithms tend to be more complex and require more tuning and optimization.
There is an even more extreme situation in the real scenario: time series datasets have a limited number of samples, so researchers have proposed some methods for semi-supervised classification methods under the condition of small sample time series data.There is less data available in bearing fault diagnosis, to address this challenge, Yongtao et al. carried out fault feature extraction through variational modal decomposition (VMD) and sample entropy, pre-trained the model using the feature matrix of unlabeled samples, and utilized the feature matrix and labels of labeled samples to fine-tune the model, and finally achieved fast and accurate bearing fault diagnosis [42].Ma et al. proposed a Consistent Regularized Auto-Encoder (CRAE) framework based on encoder-decoder networks, which firstly uses data augmentation strategies to process individual process samples into sample matrices, and extracts the local and global spatio-temporal features from the sample matrices by using local encoders and global encoders, and the introduced Consistent Regularization (CR) method pushes the decision boundaries to the low-density region, which makes the distinction between different categories more obvious and improves the accuracy of the model classification [25].Lao et al. proposed a semi-supervised weighted prototype network (SSWPN), a dual-scale neural network (DSNN) that enhances the ability to extract data features and express different scales, and used a new semi-supervised weighted prototype updating strategy, and the experimental results showed significant advantages in real-world scenarios with scarce data [18].Zhou et al. proposed the use of Deep Convolutional Generative Adversarial Networks (DCGAN), which overcome the limitations of training data and achieve highly accurate gear diagnosis in the presence of scarce labeled data [49].
Semi-supervised classification models for small-sample time-series data offer significant advantages in terms of improving data utilization and saving annotation costs, but also face challenges in model design.In practical applications, careful consideration needs to be given to how to balance the advantages and disadvantages of the models, as well as how to design and adapt the models to semi-supervised classification tasks and data characteristics.Thus, we choose to generate additional labeled data, and further explore a semi-supervised classification model more suitable for small sample time series data by exploring the intrinsic connection between labeled and unlabeled data.

Methods
In this section, we elaborate on the semi-supervised time series classification method proposed.The semi-supervised classification method consists of three modules: data augmentation module, feature extraction fusion module, and semi-supervised clas-sification module, where the semi-supervised classification module consists of a self-supervised network suitable for unlabeled data and a supervised network suitable for labeled data.The data augmentation module generates additional synthetic samples to augment the original time series data, followed by extracting the discriminant subsequence from the time series data, self-predictive regression utilizes the historical observations of the time series data to predict its future values, the encoder processes the input sequences through a series of LSTM layers and ultimately outputs a compact representation representing the future values, and the classification task uses multiple LSTM layers and convolutional layers at the same time to construct the feature vectors that ultimately classify the labeled data.The semi-supervised network is capable of extracting global and local features of the time series at multiple scales, thus this semi-supervised classification model can fully learn two different sets of data and improve the performance of the model.The time series data , is divided into two groups, one is labeled data L D and the other is unlabeled data U D .The sample size of unlabeled time series data is larger and the sample size of labeled data is smaller.These two parts of the dataset are trained using different networks, the loss networks of the two networks are combined and trained by minimizing the joint loss function to train the semi-supervised network to successfully classify the time series data.The overall architecture of the proposed method is displayed in Figure 1.

Data Augmentation Module
Conditional generative adversarial networks (CGAN) add additional auxiliary training information to generative adversarial networks [28].To better train the network, the conditional generative adversarial network is used to generate multiple types of time series data L D  based on the labeled data L D in the time series dataset S .We use the category labels of the data as additional information y is combined with a random noise vector Z , which goes from the input layer to the generator and the discriminator.
The generator extracts random vectors z from a priori random distributions ( ) p z and splices the random vectors with extra information y to produce joint hidden representations, which are used by the generator to generate a variety of pseudo-time series data.The discriminator inputs are real-time series data x , or the generator generates fake time series data samples ( ) G z y with auxiliary information y .The discriminator can determine whether the time series data samples are real or generated based on the input time series data samples and label condition information.The expression for the objective function of the conditional generative adversarial network is: the generator generates fake time series data samples   G z y with auxiliary information y .The discriminator can determine whether the time series data samples are real or generated based on the input time series data samples and label condition information.The expression for the loss function of the conditional generative adversarial network is: where G denotes the generative model and D denotes the discriminative model,   z p z denotes the prior input noise and y denotes class labels.Conditional generative adversarial networks make full use of the existing labeling information to learn and generate high-quality and diverse time series data more efficiently.Figure 2 illustrates the basic structure of data enhancement using CGAN.
Basic structure of CGAN

Feature Extraction and Fusion Module
Fast shapelets use a certain method to find out the important points in a time series, then use the subsegments of the time series that contain one or more important points as a candidate set of shapelets, and where G denotes the generative model and D denotes the discriminative model, ( ) z p z denotes the prior input noise and y denotes class labels.Conditional generative adversarial networks make full use of the existing labeling information to learn and generate high-quality and diverse time series data more efficiently.Figure 2 illustrates the basic structure of data enhancement using CGAN.

Feature Extraction and Fusion Module
Fast shapelets use a certain method to find out the important points in a time series, then use the sub-segments of the time series that contain one or more important points as a candidate set of shapelets, and then find the best shapelet from these candidate sets, which improves the efficiency of the shapelet searching and saves the running time [29].Data in the same category in a time series dataset S is a subcategory of the dataset, so there will be a different number of subcategories in different datasets.The distance between the time series is expressed using the distance function ( , ) dist S D , the expression of the distance function for the series of two times is:

Feature Extraction and Fusion Module
Fast shapelets use a certain method to find out the important points in a time series, then use the subsegments of the time series that contain one or more important points as a candidate set of shapelets, and then find the best shapelet from these candidate sets, which improves the efficiency of the shapelet searching and saves the running time [29].Data in the same category in a time series dataset S is a subcategory of the dataset, so there will be a different number of subcategories in different datasets.The distance between the time series is expressed using the distance function dist S D , the expression of the distance function for the series of two times is: where m denotes the number of time series.The distance between the time series and the subsequence is expressed using the distance function sdist S D , and the expression of the distance function between the time series and the subsequence is: where m denotes the length of the time series, k denotes the length of the subsequence, 1 q m k    .A shapelet is essentially a segment of a time series that

Module
In this section, we describe our proposed semisupervised classification module in detail.As shown in Figure 3, the semi-supervised classification module consists of two networks including a self-supervised network and a supervised network.For unlabeled time series datasets i U x D  , a self-supervised learning method is used for training and the self-prediction regression task is performed on unlabeled time series datasets.Self-prediction as an auxiliary task can help models learn rich hidden state representations from unlabeled but structured time series data [14].Self-supervised networks are encoder architectures based on Long Short-Term Memory (LSTM) networks that are capable of learning effective feature representations from input time-series data.The encoder network consists of multiple LSTM layers, each of which is an LSTM module, and each LSTM module is set up with two LSTM layers to enhance the learning capability of the model.The hidden layer activation function and the output layer activation function are used to introduce nonlinear transformations between the hidden layers and at the final output, respectively, to increase the expressive power of the model.The loss function expression for self-supervised network training is: where u D denotes the number of unlabeled samples of the time series, i y denotes the true value of the data series data, ˆi y denotes the predicted value of the network.For labeled time series datasets i L x D  , a supervised learning where m denotes the number of time series.The distance between the time series and the subsequence is expressed using the distance function sdist S D , and the expression of the distance function between the time series and the subsequence is:

Feature Extraction and Fusion Module
Fast shapelets use a certain method to find out the important points in a time series, then use the subsegments of the time series that contain one or more important points as a candidate set of shapelets, and then find the best shapelet from these candidate sets, which improves the efficiency of the shapelet searching and saves the running time [29].Data in the same category in a time series dataset S is a subcategory of the dataset, so there will be a different number of subcategories in different datasets.The distance between the time series is expressed using the distance function dist S D , the expression of the distance function for the series of two times is: where m denotes the number of time series.The distance between the time series and the subsequence is expressed using the distance function sdist S D , and the expression of the distance function between the time series and the subsequence is: where m denotes the length of the time series, k denotes the length of the subsequence, 1 q m k    .A shapelet is essentially a segment of a time series that

Semi-supervised Classification Module
In this section, we describe our proposed semisupervised classification module in detail.As shown in Figure 3, the semi-supervised classification module consists of two networks including a self-supervised network and a supervised network.For unlabeled time series datasets i U x D  , a self-supervised learning method is used for training and the self-prediction regression task is performed on unlabeled time series datasets.Self-prediction as an auxiliary task can help models learn rich hidden state representations from unlabeled but structured time series data [14].Self-supervised networks are encoder architectures based on Long Short-Term Memory (LSTM) networks that are capable of learning effective feature representations from input time-series data.The encoder network consists of multiple LSTM layers, each of which is an LSTM module, and each LSTM module is set up with two LSTM layers to enhance the learning capability of the model.The hidden layer activation function and the output layer activation function are used to introduce nonlinear transformations between the hidden layers and at the final output, respectively, to increase the expressive power of the model.The loss function expression for self-supervised network training is: where u D denotes the number of unlabeled samples of the time series, i y denotes the true value of the data series data, ˆi y denotes the predicted value of the network.For labeled time series datasets i L x D  , a supervised learning where the length of the time series T is m, k denotes the length of the subsequence, 1 q m k = − + .A shapelet is essentially a segment of a time series that maximizes the representation of a class as a subsequence.Using subclass segmentation enables strategic sampling from the training dataset, ensuring that the sampled sequence captures the core of the entire time series.Shapelet Local Farthest Deviation Points (LF-DPs) are identified in the sampled time series, and by identifying subsequences between two non-adjacent LFDPs, we can obtain shapelet candidate features with high discriminative power [15].Therefore, the expression for the transformed feature transformed S of the time series is: data The data time The ional , (1) otes nput ative sting ality re 2 sing maximizes the representation of a class as a subsequence.Using subclass segmentation enables strategic sampling from the training dataset, ensuring that the sampled sequence captures the core of the entire time series.Shapelet Local Farthest Deviation Points (LFDPs) are identified in the sampled time series, and by identifying subsequences between two nonadjacent LFDPs, we can obtain shapelet candidate features with high discriminative power [15].Therefore, the expression for the transformed feature transformed S of the time series is: where n denotes the number of shapelets.The transformed fused feature vectors are then learned using a semi-supervised classification model and the semi-supervised classification model classifies the feature vectors.

Semi-supervised Classification Module
In this section, we describe our proposed semisupervised classification module in detail.As shown in Figure 3, the semi-supervised where n denotes the number of shapelets.The transformed fused feature vectors are then learned using a semi-supervised classification model and the semi-supervised classification model classifies the feature vectors.

Semi-supervised Classification Module
In this section, we describe our proposed semi-supervised classification module in detail.As shown in Figure 3, the semi-supervised classification module consists of two networks including a self-supervised network and a supervised network.For unlabeled time series datasets i U x D ∈ , a self-supervised learning method is used for training and the self-prediction regression task is performed on unlabeled time series datasets.Self-prediction as an auxiliary task can help models learn rich hidden state representations from unlabeled but structured time series data [14].Self-supervised networks are encoder architectures based on Long Short-Term Memory (LSTM) networks that are capable of learning effective feature representations from input time-series data.The encoder network consists of multiple LSTM layers, each of which is an LSTM module, and each LSTM module is set up with two LSTM layers to enhance the learning capability of the model.The hidden layer activation function and the output layer activation function are used to introduce nonlinear transformations between the hidden layers and at the final output, respectively, to increase the expressive power of the model.The loss function expression for self-supervised network training is: erse time series data more efficiently.Figure 2 es the basic structure of data enhancement using ructure of CGAN Feature Extraction and Fusion Module apelets use a certain method to find out the nt points in a time series, then use the subts of the time series that contain one or more nt points as a candidate set of shapelets, and d the best shapelet from these candidate sets, mproves the efficiency of the shapelet searching es the running time [29].Data in the same y in a time series dataset S is a subcategory of aset, so there will be a different number of gories in different datasets.The distance the time series is expressed using the distance ( , ) dist S D , the expression of the distance for the series of two times is: m denotes the number of time series.The between the time series and the subsequence is ed using the distance function ( , ) sdist S D , and the ion of the distance function between the time nd the subsequence is: m denotes the length of the time series, k the length of the subsequence, 1 q m k    .A t is essentially a segment of a time series that where n denotes the number of shapelets.The transformed fused feature vectors are then learned using a semi-supervised classification model and the semi-supervised classification model classifies the feature vectors.

Semi-supervised Classification Module
In this section, we describe our proposed semisupervised classification module in detail.As shown in Figure 3, the semi-supervised classification module consists of two networks including a self-supervised network and a supervised network.For unlabeled time series datasets i U x D  , a self-supervised learning method is used for training and the self-prediction regression task is performed on unlabeled time series datasets.Self-prediction as an auxiliary task can help models learn rich hidden state representations from unlabeled but structured time series data [14].Self-supervised networks are encoder architectures based on Long Short-Term Memory (LSTM) networks that are capable of learning effective feature representations from input time-series data.The encoder network consists of multiple LSTM layers, each of which is an LSTM module, and each LSTM module is set up with two LSTM layers to enhance the learning capability of the model.The hidden layer activation function and the output layer activation function are used to introduce nonlinear transformations between the hidden layers and at the final output, respectively, to increase the expressive power of the model.The loss function expression for self-supervised network training is: where u D denotes the number of unlabeled samples of the time series, i y denotes the true value of the data series data, ˆi y denotes the predicted value of the network.For labeled time series datasets i L x D  , a supervised learning (5) where u D denotes the number of unlabeled samples of the time series, i y denotes the true value of the data series data, ˆi y denotes the predicted value of the network.For labeled time series datasets i L x D ∈ , a supervised learning approach is used for training to perform classification tasks on time series datasets.Supervised networks are classification networks that combine deep convolutional networks and self-attention mechanisms to capture both local features and long-range dependencies of time series data.The stacking of multiple convolutional layers enables the network to learn complex feature representations of time series, the self-attention mechanism to capture long-range dependencies of time series data.The global average pooling layer and the dropout layer reduce the number of parameters, thus reducing the risk of over-network fitting and improving the computational efficiency of the network.The computational efficiency of the model is optimized using the Dynamic Quantization meth-od, which accelerates computation and reduces energy consumption.The fully connected layer categorizes the data features.The loss function expression for supervised network training is: approach is used for training to perform classification tasks on time series datasets.Supervised networks are classification networks that combine deep convolutional networks and self-attention mechanisms to capture both local features and long-range dependencies of time series data.The stacking of multiple convolutional layers enables the network to learn complex feature representations of time series, the self-attention mechanism to capture long-range dependencies of time series data.The global average pooling layer and the dropout layer reduce the number of parameters, thus reducing the risk of over-network fitting and improving the computational efficiency of the network.The computational efficiency of the model is optimized using the Dynamic Quantization method, which accelerates computation and reduces energy consumption.The fully connected layer categorizes the data features.The loss function expression for supervised network training is: where L D denotes the number of time series labeled samples, y denotes the true label of the dataset, i p denotes the result of the probability value predicted by the classification model.Semi-supervised classification models include both self-supervised and supervised networks, the selfsupervised task is the auxiliary task and the supervised task is the main task, so the objective function is the weighted sum of the two loss functions.The selfsupervised task is the auxiliary task and the supervised task is divided into the main task.By optimizing the joint loss function, the method can effectively train all the task modules and improve the generalization ability where  is a hyper-parameter.The correct  neither biases the supervised network weights nor ignores the learning task of the selfsupervised network [16].Reinforcement learning algorithms in which individual intelligentsia are used for parameter updating, dynamically updating the weights and significantly improving the algorithm convergence efficiency and performance [43].We use reinforcement learning to dynamically adjust the weight coefficients  in the loss function [23].The steps of weight updating based on reinforcement learning are as follows: Step 1: Agent Initialization.When an agent is initialized, a Q-table is created and a state matrix   Step 2: Construct the reward function R , the loss function L and the evaluation function Q .The reward function indicates that rewards are given based on improvements in accuracy, thus encouraging actions that will improve the performance of the model.
Step 3: Q-table state update.Using the learning rate and discount factor, the Q-value corresponding to the current state and action is adjusted according to the update rules of the Qlearning algorithm.The agent updates the Q-table based on the reward received and the maximum expected Q-value of the next state.The agent all updates its state based on the rewards it receives  , with the aim of finding the optimal balance between supervised and unsupervised loss and achieving the highest accuracy.
where   , Q s a denotes the Q value of the action a taken in the current state s , l denotes the learning rate, r denotes the reward R obtained after the execution of the current action a ,  denotes the discount factor, which is the degree of (6) where L D denotes the number of time series labeled samples, y denotes the true label of the dataset, i p denotes the result of the probability value predicted by the classification model.
Semi-supervised classification models include both self-supervised and supervised networks, the self-supervised task is the auxiliary task and the supervised task is the main task, so the objective function is the weighted sum of the two loss functions.The self-supervised task is the auxiliary task and the supervised task is divided into the main task.By optimizing the joint loss function, the method can effectively train all the task modules and improve the generalization ability of the semi-supervised classification model.The loss function expression for the semi-supervised classification module is: ries data.The stacking of multiple convolutional yers enables the network to learn complex feature presentations of time series, the self-attention echanism to capture long-range dependencies of time ries data.The global average pooling layer and the opout layer reduce the number of parameters, thus ducing the risk of over-network fitting and improving e computational efficiency of the network.The mputational efficiency of the model is optimized ing the Dynamic Quantization method, which celerates computation and reduces energy nsumption.The fully connected layer categorizes the ta features.The loss function expression for pervised network training is: here L D denotes the number of time series labeled mples, y denotes the true label of the dataset, i p notes the result of the probability value predicted by e classification model.emi-supervised classification models include both lf-supervised and supervised networks, the selfpervised task is the auxiliary task and the supervised sk is the main task, so the objective function is the eighted sum of the two loss functions.The selfpervised task is the auxiliary task and the supervised sk is divided into the main task.By optimizing the int loss function, the method can effectively train all e task modules and improve the generalization ability the semi-supervised classification model.The loss nction expression for the semi-supervised assification module is: algorithms in which individual intelligentsia are used for parameter updating, dynamically updating the weights and significantly improving the algorithm convergence efficiency and performance [43].We use reinforcement learning to dynamically adjust the weight coefficients  in the loss function [23].The steps of weight updating based on reinforcement learning are as follows: Step 1: Agent Initialization.When an agent is initialized, a Q-table is created and a state matrix   Step 2: Construct the reward function R , the loss function L and the evaluation function Q .The reward function indicates that rewards are given based on improvements in accuracy, thus encouraging actions that will improve the performance of the model.
Step 3: Q-table state update.Using the learning rate and discount factor, the Q-value corresponding to the current state and action is adjusted according to the update rules of the Qlearning algorithm.The agent updates the Q-table based on the reward received and the maximum expected Q-value of the next state.The agent all updates its state based on the rewards it receives  , with the aim of finding the optimal balance between supervised and unsupervised loss and achieving the highest accuracy.
where   , Q s a denotes the Q value of the action a taken in the current state s , l denotes the learning rate, r denotes the reward R obtained after the execution of the current action a ,  denotes the discount factor, which is the degree of importance attached to the future reward, and all the possible actions in the next state s , which denotes the maximum reward expected from the next state.(7) where λ is a hyper-parameter.The correct λ neither biases the supervised network weights nor ignores the learning task of the self-supervised network [16].Reinforcement learning algorithms in which individual intelligentsia are used for parameter updating, dynamically updating the weights and significantly improving the algorithm convergence efficiency and performance [43].We use reinforcement learning to dynamically adjust the weight coefficients λ in the loss function [23].The steps of weight updating based on reinforcement learning are as follows: Step 1: Agent Initialization.When an agent is initialized, a Q-table is created and a state matrix Step 3: Q-table state update.Using the learning rate and discount factor, the Q-value corresponding to the current state and action is adjusted according to the update rules of the Q-learning algorithm.The agent updates the Q-table based on the reward received and the maximum expected Q-value of the next state.The agent all updates its state based on the rewards it receives α , with the aim of finding the optimal balance between supervised and unsupervised loss and achieving the highest accuracy.
representations of time series, the self-attention mechanism to capture long-range dependencies of time series data.The global average pooling layer and the dropout layer reduce the number of parameters, thus reducing the risk of over-network fitting and improving the computational efficiency of the network.The computational efficiency of the model is optimized using the Dynamic Quantization method, which accelerates computation and reduces energy consumption.The fully connected layer categorizes the data features.The loss function expression for supervised network training is: where L D denotes the number of time series labeled samples, y denotes the true label of the dataset, i p denotes the result of the probability value predicted by the classification model.Semi-supervised classification models include both self-supervised and supervised networks, the selfsupervised task is the auxiliary task and the supervised task is the main task, so the objective function is the weighted sum of the two loss functions.The selfsupervised task is the auxiliary task and the supervised task is divided into the main task.By optimizing the joint loss function, the method can effectively train all the task modules and improve the generalization ability of the semi-supervised classification model.The loss function expression for the semi-supervised classification module is: the algorithm convergence efficiency and performance [43].We use reinforcement learning to dynamically adjust the weight coefficients  in the loss function [23].The steps of weight updating based on reinforcement learning are as follows: Step 1: Agent Initialization.When an agent is initialized, a Q-table is created and a state matrix   the change in weight.The state matrix is the current weights of the self-supervised network and the supervised network, and the action matrix is the adjustment of the weights, which represents an increase of 0.1 or a decrease of 0.1 in the value.
Step 2: Construct the reward function R , the loss function L and the evaluation function Q .The reward function indicates that rewards are given based on improvements in accuracy, thus encouraging actions that will improve the performance of the model.
Step 3: Q-table state update.Using the learning rate and discount factor, the Q-value corresponding to the current state and action is adjusted according to the update rules of the Qlearning algorithm.The agent updates the Q-table based on the reward received and the maximum expected Q-value of the next state.The agent all updates its state based on the rewards it receives  , with the aim of finding the optimal balance between supervised and unsupervised loss and achieving the highest accuracy.
where   , Q s a denotes the Q value of the action a taken in the current state s , l denotes the learning rate, r denotes the reward R obtained after the execution of the current action a ,  denotes the discount factor, which is the degree of importance attached to the future reward, and   max , a Q s a    denotes the maximum Q value of all the possible actions in the next state s , which denotes the maximum reward expected from the next state.Step 6: When both the model and the Agent reach a certain level of optimization, the performance of the model on the test set is used to evaluate the effectiveness of the whole training process.The optimal α value is obtained, and ultimately the optimal classification accuracy is obtained. After

UCR Time Series Benchmark Dataset
We conducted a validation process by selecting small datasets from the UCR benchmark dataset, UCR dataset is a well-established dataset for time series analysis.The training and testing of our method were performed using the datasets specified in the UCR time series [6].The primary criteria for the dataset were that the sample size was less than 1000, the dataset consisted of several different time series lengths, and contained both binary and multi-categorical datasets.Additionally, In order to ensure the reliability of the experimental results and the validity of the comparisons, we employed three different ratios of labeled dataset to unlabeled dataset, namely 1:9, 3:7, and 5:5.The basic information of the UCR time series is presented in Table 1, where instance represents the num-

Table1
Basic information of the UCR benchmark time series dataset ber of samples in the dataset, all the series data are of equal length, length represents the length of the dataset, and class represents the category of the dataset.

Methods of Comparison
This paper compares several semi-supervised classification methods for time series.Label Propagation propagates the labels of labeled data to unlabeled data, thus realizing the classification of labeled data [33].
The self-training method utilizes the model to make predictions on unlabeled data and uses these self-generated predictions to augment the training of the model, for comparative use, KNN is used as the base classifier in this paper, and good classification accuracies are obtained through multiple iterations [4].Pseudo Label helps the model to learn better from unlabeled information, and our semi-supervised classification model based on Pseudo Label implemented by the underlying LSTM network [19].Semi-supervised Time Series Classification (MTL) model utilizes features learned from self-supervised tasks with unlabeled data while drawing on established multi-task learning methods and model predictions as auxiliary tasks to be optimized along with the primary task of classification [14].A semi-supervised time series classification model for self-supervised learning (SSTSC), which uses self-supervised learning as a secondary task that is co-optimized with the primary Time Series Classification (TSC) task [38].In this paper, these five methods are fully compared with the DA-FSSSC method to verify the validity of the present method.

Performance Evaluation
For the semi-supervised classification problem, this paper chooses the classical classification accuracy rate as the evaluation index.The accuracy rate refers to the percentage of the total samples that the model predicts correctly.This paper compares several semi-supervised classification methods for time series.Label Propagation propagates the labels of labeled data to unlabeled data, thus realizing the classification of labeled data [33].The self-training method utilizes the model to make predictions on unlabeled data and uses these self-generated predictions to augment the training of the model, for comparative use, KNN is used as the base classifier in this paper, and good classification accuracies are obtained through multiple iterations [4].
Pseudo Label helps the model to learn better from unlabeled information, and our semi-supervised classification model based on Pseudo Label implemented by the underlying LSTM network [19].Semi-supervised Time Series Classification (MTL) model utilizes features learned from self-supervised tasks with unlabeled data while drawing on established multi-task learning methods and model predictions as auxiliary tasks to be optimized along with the primary task of classification [14].A semi-supervised time series classification model for self-supervised learning (SSTSC), which uses self-supervised learning as a secondary task that is co-optimized with the primary Time Series Classification (TSC) task [38].In this paper, these five methods are fully compared with the DA-FSSSC method to verify the validity of the present method.

Performance Evaluation
For the semi-supervised classification problem, this paper chooses the classical classification accuracy rate as the evaluation index.The accuracy rate refers to the percentage of the total samples that the model predicts correctly.

ccuracy , TP TN
where TP denotes the number of positive class samples that the model correctly predicts as positive, TN denotes the number of negative class samples that the model correctly predicts as negative, FP denotes the number of negative class samples that the model incorrectly predicts as positive, and FN denotes the number of positive class samples that the model incorrectly predicts as negative.

Analysis of Results
The DA-FSSSC method is implemented in Python using the Pytorch library.Before training, we normalize all datasets.Table 2 demonstrates the results of the classification comparison between the DA-FSSSC method and the comparison method.Classification accuracy is the average of the results from five runs of the model.The best results are shown in black and bold.
Based on the determined small sample dataset, the comparison of methods is carried out, the results are shown in Table 2. Compared with the comparison methods, the proposed method in this paper obtains the highest accuracy on most of the datasets, which indicates that the semi-supervised classification method in this paper performs better, and the advantage outperforms the other methods.
The label propagation method for semi-supervised classification of small-sample time-series data can effectively utilize the available labels and is highly scalable, but the method is difficult to capture long-term dependencies in time-series data.Selftraining algorithms can utilize unlabeled data to improve model performance in semi-supervised classification tasks using small samples of temporal data.However, the effectiveness of these algorithms is compromised by error propagation and the inherent complexity of temporal data.Pseudo-labeling techniques can be adapted to time-series features and are a simple and versatile semi-supervised technique.The effectiveness of the algorithm depends heavily on the quality of the initial model, and thus requires robust initial model training as well as strategies to limit the impact of erroneous pseudo-labeling.The MTL algorithm significantly outperforms state-of-theart baseline algorithms in a semi-supervised setting by means of a ConvNet model that jointly performs classification and auxiliary prediction, but the method requires careful tuning of the hyper-parameters, and in the future more methods incorporating consistency regularization will need to be explored to improve performance.SSTSC improves the performance of the classification task by exploiting the semantic context in unlabeled data.DA-FSSSC can effectively expand the dataset by generating additional training samples to improve the diversity of data and alleviate the overfitting problem caused by smallsample data.Meaningful shape features extracted from time-series data can improve the accuracy and efficiency of classification, and thus the method can enable the model to maintain a high degree of flexibility and accuracy when dealing with complex time-series data.The comparison results can also show the advantage of the integrated model over a single semi-supervised model in that it can effectively utilize the advantages of each component to improve the classification performance. ( where TP denotes the number of positive class samples that the model correctly predicts as positive, TN denotes the number of negative class samples that the model correctly predicts as negative, FP denotes the number of negative class samples that the model incorrectly predicts as positive, and FN denotes the number of positive class samples that the model incorrectly predicts as negative.

Analysis of Results
The DA-FSSSC method is implemented in Python using the Pytorch library.Before training, we normalize all datasets.Table 2 demonstrates the results of the classification comparison between the DA-FSSSC method and the comparison method.Classification accuracy is the average of the results from five runs of the model.The best results are shown in black and bold.
Based on the determined small sample dataset, the comparison of methods is carried out, the results are shown in Table 2. Compared with the comparison methods, the proposed method in this paper obtains the highest accuracy on most of the datasets, which indicates that the semi-supervised classification method in this paper performs better, and the advantage outperforms the other methods.The label propagation method for semi-supervised classification of small-sample time-series data can effectively utilize the available labels and is highly scalable, but the method is difficult to capture long-term dependencies in time-series data.Self-training algorithms can utilize unlabeled data to improve model performance in semi-supervised classification tasks using small samples of temporal data.However, the effectiveness of these algorithms is compromised by error propagation and the inherent complexity of temporal data.Pseudo-labeling techniques can be adapted to time-series features and are a simple and versatile semi-supervised technique.The effectiveness of the algorithm depends heavily on the quality of the initial model, and thus requires robust initial model training as well as strategies to limit the impact of erroneous pseudo-labeling.The MTL algorithm significantly outperforms state-of-the-art baseline algorithms in a semi-supervised setting by means of a ConvNet model that jointly performs classification and auxiliary prediction, but the method requires careful tuning of the hyper-parameters, and in the future more methods incorporating consistency regularization will need to be explored to improve performance.SSTSC improves the performance of the classification task by exploiting the semantic context in unlabeled data.DA-FSSSC can effectively expand the dataset by generating additional training samples to improve the diversity of data and alleviate the overfitting problem caused by small-sample data.Meaningful shape features extracted from time-series data can improve the accuracy and efficiency of classification, and thus the method can enable the model to maintain a high degree of flexibility and accuracy when dealing with complex time-series data.The comparison results can also show the advantage of the integrated model over a single semi-supervised model in that it can effectively utilize the advantages of each component to improve the classification performance.

Statistical Analysis
The Friedman Test and post-hoc Nemenyi Test can benchmark algorithms based on their rankings on different datasets, comparing the performance of multiple algorithms on multiple datasets [7].The algorithms were first ranked and the average ranking was calculated, then Friedman test was utilized to determine if there was a significant difference in the rankings of the algorithms.If the Friedman test shows that there was a significant difference, pairwise differences between the algorithms were further investigated using the post-hoc test, which yields a Critical Difference (CD) value.Figure 4 illustrates the results of statistical tests for different proportions of unlabeled datasets, where the more rightward the position of the model on the axes, the higher the classification accuracy of the model.
Figure 4 illustrates the results of the statistical analysis, where we rejected the original hypothesis because the p value was much smaller than the significance level, indicating that at least two of the five models were statistically significantly different.The results in Figure 4 are all calculated at a significance level of 0.05.According to Figure 4, it can be clearly seen that the semi-supervised classification method proposed in this paper performs the best among the six methods when the unlabeled percentage of the dataset is 90%, 70%, and 50.The integrated methods MTL, SSTSC, and DA-FSSSC are located on the right hand side of the graphical axes, so it can be demonstrated that the integrated methods outperform the single methods in semi-supervised classification of small sample datasets of time series.

ECG Dataset
Semi-supervised classification of ECG data has a very important role in the medical industry as classification models can identify previously undetected arrhythmia patterns or subtle changes [17].Semi-su- We test the ECG dataset using five comparative methods and the method proposed in this paper and sim- ulate three different unlabeled datasets of 90%, 70%, and 50%.The accuracy test results of the method are shown in Table 3.
According to the comparison results in Table 3, when the percentage of unlabeled datasets is higher than 90%, the accuracy of the semi-supervised classification method proposed in this paper is higher than 0.800 and higher than all the compared methods.
When the percentage of unlabeled dataset is 50%, the classification accuracy of the proposed method in this paper is higher than 0.900.When the percentage of unlabeled datasets is 70%, the classification accuracy of the proposed method in this paper is 0.855, and the accuracies of label propagation, self-training, and SSTSC are higher than 0.800.When the percentage of unlabeled dataset is 50%, the classification accuracy of the proposed method in this paper is higher than 0.900.Semi-supervised learning reduces the need for large amounts of labeled data, which reduces the cost and time of data preparation, and achieving higher accuracy on semi-supervised classification tasks for ECG data is important for early diagnosis and treatment of cardiac diseases.

EEG Dataset
In reality, it is relatively easy to obtain unlabeled EEG data, but obtaining expert manually labeled data is very difficult and expensive.Unlabeled EEG data may be overlooked or underutilized in many studies, and semi-supervised learning makes these data a valuable resource to help improve their utilization [40].The dataset used in this paper is exemplary segmented EEG time series recordings of ten epilepsy patients collected from the Neurology and Sleep Center, Hauz Khas, New Delhi [34].We randomly assigned 80% of the data in each class to the training set and the remaining 20% to the test set.The goal of the semi-supervised classification meth- od is to be able to recognize the seizure phases of epilepsy, which are preictal, interictal, and ictal.The EEG dataset is shown in Figure 6.
We test the EEG dataset using five comparative methods and the method proposed in this paper and sim-ulate three different unlabeled percentages of 90%, 70%, and 50%.The accuracy test results of the methods are shown in Table 4.
According to the comparison results in Table 4, when the percentage of unlabeled dataset is higher than 90%, the accuracy of the semi-supervised classification method proposed in this paper is higher than 0.700, with an accuracy of 0.715.When the percentage of unlabeled dataset is 70%, the classification accuracy of SSTSC is highest, and the semi-supervised classification method proposed in this paper ranks second.When the percentage of unlabeled dataset is 50%, the classification accuracy of the proposed method is higher than 0.900.Also, the classification accuracy of MTL and SSTSC is higher than 0.900, indicating that the classification accuracy of the integrated model is higher than that of the single model.Semi-supervised classification of EEG epilepsy data improves the efficiency and accuracy of epilepsy monitoring and diagnosis, reduces healthcare costs, and is important for early identification of epilepsy patients and prediction of seizures.

Plant Electrical Signal Datasets
When plants encounter stimuli, they rapidly send electrical signals to different parts of the plant and even to the whole plant, plant electrical signals are a record of electrical activity in plant cells and tissues [20].Plant electrical signals can more quickly reflect external stimuli received by the plant.In the absence of labels or with only a small amount of labeled data, semi-supervised learning helps to identify and interpret patterns of plant electrical signals, enabling faster understanding and prediction of plant responses to environmental changes.In this paper, we utilize plant electrical signal data obtained from actual laboratory measurements, selected from 300 ml of NaCl concentration stimulation, to identify the salt tolerance of wheat [41].The goal of the semi-supervised classification method is to be able to correctly identify the signaling dynamics of wheat leaves DeKang961 (salt-tolerant) and Langdon (salt-sensitive) under continuous stress of NaCl concentration.The Plant electrical signals dataset is shown in Figure 7.
We test the Plant electrical signal dataset using five comparative methods and the method proposed in this paper and simulated three different unlabeled ratios of 90%, 70%, and 50%.The accuracy test results of the methods are shown in Table 5.According to the comparison results in Table 5, the Self-Training model has the highest classification accuracy when the proportion of unlabeled dataset is higher than 90%, which indicates that Self-training is able to maximize the use of these limited labeled data to bootstrap the learning of unlabeled data, adapts to the new plant electrical signal data distribution in each iteration, and improves the adaptability of the model by continuously updating itself.The accuracy of the DA-FSSSC method ranked second among all methods.When the proportion of unlabeled data is 50% and 70%, the DA-FSSSC method utilizes both labeled data and a large amount of unlabeled data, combined with data augmentation, a small amount of labeled data can also generate more training samples, which can be used for training together with unlabeled data to improve the efficiency and effectiveness of model learning, and the model achieves satisfactory performance with limited plant electrical signal data.The classification accuracy of the semi-supervised classification method proposed in this paper is higher than the other compared models at both 70% and 50% of unlabeled datasets.Successful semi-supervised classification of plant electrical signal data allows depth study of different adaptive and response mechanisms of plants to salt stress, helps to reveal the physiological, genetic and molecular mechanisms of plants, contributes to the selection of varieties with strong salt tolerance, and provides an important basis for plant genetic improvement and breeding.

Visual Analysis of Results
t-SNE is a nonlinear dimensionality reduction technique that preserves the local structure in the data so that similar data points remain close in the lower dimensional space [10].To better explain and visualize the semi-supervised classification strategy of our model for the plant electrical dataset, we used a dimensionality reduction technique to map the plant electrical data from a high-dimensional space to a two-dimensional space, to better understand the classification process of the semi-supervised classification module by drawing decision boundaries.Figure 8 illustrates the categorical t-SNE decision boundary plots for plant electrical signal data for different proportions of unlabeled data.
The analysis presented in Figure 8 demonstrates that our model effectively captures significant features in unlabeled data, and both self-supervised learning and supervised learning techniques contribute to improved classification of plant electrical signal time series data.In general, the more labeled data there is, the better the model performs because the labeled data provides the model with more information about the true class labels, which helps the model learn decision boundaries and representations more accurately.When the proportion of unlabeled data is 50%, DA-FSSSC has more labeled examples to learn from and more information to train on, with higher classification accuracy.Although some plant electrical signal data could not be classified successfully for the time being, our model achieved good classification performance in most cases.

Ablation Study
To demonstrate that each module in the semi-supervised classification model utilizes the strengths of the module, we conducted an ablation study of semi-supervised classification methods.Ablation studies were performed on the plant electrical signal dataset.The basic model is the semi-supervised classification method SSC, the second method is the semi-supervised classification method DA-SSC with the addition of a data enhancement module, the third model is the semi-supervised classification method FSSSC with the addition of a feature extraction and fusion module, and the fourth method is the semi-supervised classification model DA-FSSSC with the addition of a data enhancement module and a feature extraction and fusion module.Figure 9 illustrates the results of the ablation study.As can be seen from Figure 9, with different proportions of unlabeled data, the classification accuracy of the DA-SSC method is higher than that of the SSC model, which indicates that the semi-supervised classification network is sufficiently trained after adding time series data, and the accuracy of classification is improved, although the network training sacrifices sometimes.CGAN generates data to enhance model robustness and prevent overfitting, and is an effective strategy to improve model quality in the presence of insufficient data.The classification accuracies of the FSSSC method are higher than that of the SSC method, which indicates that the feature extraction module can effectively learn the potential information of time series data and improve classification accuracy.The results of the ablation study effectively demonstrate that each module of DA-FSSSC contributes to the semi-supervised classification task under small sample conditions.

Results and Future Work
This study introduces a novel semi-supervised classification approach that involves augmenting the labeled samples through data generation, extracting the shape features of the time series, and employing a combination of self-supervised learning and supervised learning strategies for semi-supervised classification.The performance of the proposed model is initially assessed using the UCR dataset.The classification accuracy of the method was better than the comparison method at 90%, 70%, and 50% of unlabeled samples.Meanwhile, to verify the applicability of the method, we select ECG, EEG, and plant electrical signal datasets to test again, visualization analysis and ablation studies are carried out on the measured plant electrical signal datasets, and the results show that the method proposed in this paper achieves a better classification effect in real scenarios as well.CGAN can effectively augment the dataset by generating additional training samples to improve the diversity of the data,the fast shapelet algorithm is able to efficiently extract meaningful shapes from time-series data to improve the accuracy and efficiency of classification,The semi-supervised classification method enables the model to maintain a high level of flexibility and accuracy when dealing with time-series data with complex patterns and dynamic changes.Therefore, the method in this paper provides a powerful and flexible solution for dealing with the problem of classifying complex time-series data with small samples.The next step is to combine the improved feature learning and label learning methods to further improve the accuracy of the model classification with very few labeled time series samples through innovative methods and sufficient computational resources.In addition, it is important to consider how to improve the model when dealing with large datasets.

Figure 1
Figure 1 Flowchart of semi-supervised classification method

Figure 2
Figure 2 Basic structure of CGAN

Figure 3
Figure 3Architecture of Semi-supervised classification module

1  2 
is the weight of the supervisory network, is the weight of the self-supervisory network, i   denotes the change in weight.The state matrix is the current weights of the self-supervised network and the supervised network, and the action matrix is the adjustment of the weights, which represents an increase of 0.1 or a decrease of 0.1 in the value.

igure 3 rchitecture
of Semi-supervised classification module

1  2 
is the weight of the supervisory network, is the weight of the self-supervisory network, i   denotes the change in weight.The state matrix is the current weights of the self-supervised network and the supervised network, and the action matrix is the adjustment of the weights, which represents an increase of 0.1 or a decrease of 0.1 in the value.

1 α
is the weight of the supervisory network, 2 α is the weight of the self-supervisory network, i α ∆ denotes the change in weight.The state matrix is the current weights of the self-supervised network and the supervised network, and the action matrix is the adjustment of the weights, which represents an increase of 0.1 or a decrease of 0.1 in the value.Step 2: Construct the reward function R , the loss function L and the evaluation function Q .The reward function indicates that rewards are given based on improvements in accuracy, thus encouraging actions that will improve the performance of the model.

Figure 3
Figure 3 Architecture of Semi-supervised classification module

1  2 
is the weight of the supervisory network, is the weight of the self-supervisory network,

4 : 5 :
denotes the Q value of the action a taken in the current state s, l denotes the learning rate, r denotes the reward R obtained after the execution of the current action a, γ denotes the discount factor, which is the degree of importance attached to the future reward, and the maximum Q value of all the possible actions in the next state s′, which denotes the maximum reward expected from the next state.Step Loop training.Repeat steps 2 and 3 to calculate the accuracy using the model performance on the validation set and update the α value via Agent accordingly.Step Depending on the action chosen by the Agent and the reward received, the α value is updated to guide the training of the model in the next loop.

Figure 3
Figure 3 Architecture of Semi-supervised classification module

Figure 4
Figure 4Results of statistical analysis, (a) Rate of unlabeled dataset is 90%, (b) Rate of unlabeled dataset is 70%, (c) Rate of unlabeled dataset is 50%

Figure 5
Figure 5 Display of ECG data samples

Figure 7
Figure 7 Sample display of plant electrical dataset

Figure 8
Figure 8 Visualization of t-SNE and decision boundaries for the plant electric dataset (a) Rate of unlabeled dataset is 90%, (b) Rate of unlabeled dataset is 70%, (c) Rate of unlabeled dataset is 50%

Figure 9
Figure 9Results of ablation studies