Forecasting Secondhand Tanker Price Through Wavelet Neural Networks Based on Adaptive Genetic Algorithm

Seaborne crude oil remains the main source of energy in the modern world in terms of volume, accounting for nearly half of all internationally traded crude oil. The shipping market is already characterized by high volatility, coupled with the impact of COVID-19 lockdown and geopolitics events. Price forecasting has become a necessary and challenging task for shipowners and other stakeholders. In the shipping market forecasting literature, the usual focus is on the newbuilding ship price or freight rate. A limited number of literature is for secondhand tanker price. On the other hand, there is few literature that use wavelet neural networks based on adaptive genetic algorithm (AGA-WNN) to predict shipping market. This paper mainly studies the application of the hybrid model to secondhand price prediction of 5 kinds of tanker sizes. The performance of AGA-WNN on time series of 10 and 15 years is compared with the basic performance provided by the six benchmark models, using three error metrics and two statistical tests. We can point out that AGA-WNN provides encouraging and promising results, outperforming the baseline models in both accuracy and robustness. It can be said that AGA-WNN gives the best overall predictive performance.


Introduction
The characteristics of high volatility in shipping market can be attributed to intrinsic characteristics and external environment uncertainty [41], for example, economy cycle, trade war, national policies, black swan events and so on. Recently, the Russia-Ukraine conflict has made a significant impact on the oil and tanker markets, with geopolitical events affecting a range of factors including energy prices, tanker markets and seaborne oil trading patterns. Tanker freight rates on ex-Russia routes spiked since late February 2022. Aframax earnings on the Baltic-UKC route surged to over $230,000/day in early March, compared to an average of $10,000/day in Jan-Feb [11]. The price of secondhand ships can fluctuate by millions of dollars a month.
Under the backdrop of rising oil prices and COVID-19 lockdown, trade situation is volatile and policy may change further, making forecast difficult. Newbuilding price, secondhand price, charter rate and scrap value are four important markets that influence and determine investment decisions, costs and profits in the shipping industry [18]. An accurate judgment of the inflection point in time series trends allows shipowners to buy low and sell high, achieving better profits. Hence, shipowners need to make more prudent decisions.
Many researchers have done a lot of work in shipping market forecasting. [36] applies the support vector regression (SVR) model, a novel and innovative forecasting framework, to the empirical study of newbuilding ship (dry bulk and tanker) price forecasts for the first time. [33] considers two alternative neural networks specifications, NN-MLP and NN-RBF, to predict the period charter rates of VLCC. The results show that the neural networks can provide better per-formance than the traditional method (ARIMA) in dealing with tanker period or spot charter rates, thus confirming the previous empirical evidence for the spot market from [28] and [30]. The usual focus in literature is on the newbuilding ship price or freight rate while a limited number of literature is for secondhand ship price. Sale-and-purchase transactions of secondhand ships is a main source of profit for shipowners whose profitability depends on the de-cision-making occasion [9]. Consequently, secondhand ship market plays an important economic role in shipping industry. In this paper, the existing efforts are predominantly aimed at the secondhand tanker market and the results will have practical significance for relevant stakeholders, such as investors who need to determine the appropriate time of investment and withdrawal.
The purpose of our forecasting work is to learn linear or nonlinear functions of historical prices and to give reliable predictions of unseen data. All the models we propose are based on the fact that the temporal patterns of secondhand tanker price contain useful information for predicting their future movements. It is known to all that the shipping market is highly dynamic. Therefore, it is difficult to accurately model such highly volatile features with traditional models. In order to overcome this limitation, a hybrid model consisting of two algorithms (wavelet neural networks and adaptive genetic algorithm) is proposed. This machine learning model has shown significant success in a variety of applications, but not yet in the shipping market.
Several contributions of this paper are summarized as follows: 1) To the author's best knowledge, wavelet neural networks based on adaptive genetic algorithm (AGA-WNN) is applied to secondhand tanker market prediction for the first time. 2) In terms of accuracy, the result of AGA-WNN is much better than the traditional linear models and slightly better than the existing machine learning models. In terms of robustness, the result of AGA-WNN is superior to all the baseline models proposed. 3) All model errors are established according to out-of-sample accuracy. 4) In traditional shipping markets, shipowners often make decisions based on intuition and past experience. However, the volatility of shipping market and the uncertainty of economic development bring high degree of unreliability to the empirical decisions. Data-driven decision-making can objectively and accurately predict the shipping assets value and maritime business risk, which will enable shipping companies to determine better investment options and enter or exit the market at the right time.
Study in [24] compares the price risks between different ship sizes in the tanker industry, using autoregressive conditional heteroscedasticity (ARCH) models and presenting the fluctuation of shipping market as a time-varying process. Research in [1] tests the performance of vector equilibrium correction models (VECM) in predicting spot and forward prices of major shipping routes. A strong convergence between the forward rates and the spot rates is shown, i.e., the forward rates do help predict the spot rates. Study in [20] focuses on spot price prediction from two aspects: (1) multivariate models (VAR and VECM) and (2) univariate models (ARIMA, GARCH and E-GARCH), so as to obtain the best prediction model for each ship type (tankers and bulk carriers). In addition, the prediction results are modified by combinatorial method theory. In [8,37], a multivariate vector autoregressive model (VARX) containing exogenous variables is established to improve the prediction accuracy of BDI. There are also other novel forecasting methods, for example, judgmental forecasting [14], copula-based multivariate models [42], fuzzy time series modelling approach [17] and popular machine learning algorithms [19,39]. Although it is convenient to use the linear or nonlinear methods mentioned above, there are certain restrictions on the number of dependent and independent variables. Artificial neural network allows parameter estimation based on a large number of independent and dependent variables and has good generalization performance. Recently, artificial neural networks is one of the most widely used machine learning algorithms, whose most attractive feature is strong nonlinear processing, self-learning and self-adaptation advantages. The generalization ability of ANN provides conditions for its potential in prediction. ANN cannot only be applied in shipping economy market [13,15,21,31,32,40], but also provide help for ship technological design. Study in [5], respectively, uses ANN to estimate engine power and fuel consumption, and then estimate carbon dioxide emissions, for the recent tankers, bulk carriers and container ships built from 2015 to present. Wavelet neural networks (WNN) is an improved version of BP neural networks, which combines the superiorities of wavelet analysis and neural networks. Therefore, WNN has strong approximation and fault-tolerance ability. According to the wavelet multiscale decomposition of time series, [27] reveals different time frequency variation patterns of dry 339 Information Technology and Control 2023/2/52 bulk shipping indices, and uses the wavelet as the activation function of neural networks. The initial time series is decomposed, predicted and combined to obtain the final predicted value. The application effect of neural networks in empirical research is always inspiring.

Research Framework
Firstly, the baseline models are introduced and the wavelet neural network based on adaptive genetic algorithm is described in detail. Then SARIMA in the baseline model is applied to obtain the autocorrelation order of all time series. The initial data set is transformed into a standard form with input independent variables and output dependent variables. After data pre-processing, all prediction models are simulated, three prediction error measures are calculated and two statistical tests are performed to compare the advantages and disadvantages of each model. Finally, the results are analyzed and the limitations are proposed.
To reiterate, the main purpose of this paper is to prove the effectiveness and superiority of the target algorithm for prediction. Time series prediction can be roughly divided into two types. If only the previous values of time series are used to predict its future values, it is called univariate time series prediction. If we use variables other than time series (i.e. exogenous variables) for prediction, it is called multivariable time series prediction. Considering our purpose, data acquisition and final results, we choose the former and determine that the information of time series is sufficient for prediction. The latter can be further discussed in future studies.

Brief Introduction of Baseline Models
Firstly, we suppose that the target time series is t φ where t = 1, 2, . . . , N . Our prediction task is to calculate the values of wavelet as the activation function of neural networks. The initial time series is decomposed, predicted and combined to obtain the final predicted value. The application effect of neural networks in empirical research is always inspiring.

Research Framework
Firstly, the baseline models are introduced and the wavelet neural network based on adaptive genetic algorithm is described in detail. Then SARIMA in the baseline model is applied to obtain the autocorrelation order of all time series. The initial data set is transformed into a standard form with input independent variables and output dependent variables. After data pre-processing, all prediction models are simulated, three prediction error measures are calculated and two statistical tests are performed to compare the advantages and disadvantages of each model. Finally, the results are analyzed and the limitations are proposed.
To reiterate, the main purpose of this paper is to prove the effectiveness and superiority of the target algorithm for prediction. Time series prediction can be roughly divided into two types. If only the previous values of time series are used to predict its future values, it is called univariate time series prediction. If we use variables other than time series (i.e. exogenous variables) for prediction, it is called multivariable time series prediction. Considering our purpose, data acquisition and final results, we choose the former and determine that the information of time series is sufficient for prediction. The latter can be further discussed in future studies.

Brief Introduction of Baseline Models
Firstly, we suppose that the target time series is t φ where t = 1, 2, . . . , N . Our prediction task is to calculate the values of 1 2 , , Seasonal naive forecast: Naive forecast uses the most recent value as its prediction; likewise, seasonal naive forecast anticipate the forecast to be the observation on corresponding position of the previous time period. The model is defined as the following: where t > 12. While the most simple method, seasonal naive forecast is quite robust for highly fluctuant time series because the temporal patterns in history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks.

SARIMA
where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε repre- ters ψ and θ need to be estimated, and order p , q , P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: ( ) ( ) where t r and t s represent trend factor and Then, a brief overview of each baseline model is demonstrated.
Seasonal naive forecast: Naive forecast uses the most recent value as its prediction; likewise, seasonal naive forecast anticipate the forecast to be the observation on corresponding position of the previous time period. The model is defined as the following: where t > 12. While the most simple method, seasonal naive forecast is quite robust for highly fluctuant time series because the temporal patterns in history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks.

SARIMA:
Main effective factors of time series generally include long-term trend (T), seasonal fluctuation (S), cyclical fluctuation (C) and irregular fluctuation (I), in other words, The differential operation can help remove the impact of periodical trends, seasonal variation and multiplicative interaction factor so as to stabilize the time series. As a result, the SARIMA model, integrating seasonal autocorrelation, auto regression model (AR), differencing and moving average model (MA), is another popular forecasting method and has succeeded in many forcasting tasks. In this paper, the model is defined as the following: et as the activation function of neural net-. The initial time series is decomposed, preand combined to obtain the final predicted The application effect of neural networks in ical research is always inspiring.

ethodologies esearch Framework
, the baseline models are introduced and the et neural network based on adaptive genetic thm is described in detail. Then SARIMA in seline model is applied to obtain the autoation order of all time series. The initial data transformed into a standard form with input endent variables and output dependent var-. After data pre-processing, all prediction ls are simulated, three prediction error res are calculated and two statistical tests erformed to compare the advantages and antages of each model. Finally, the results alyzed and the limitations are proposed.
terate, the main purpose of this paper is to the effectiveness and superiority of the targorithm for prediction. Time series predicn be roughly divided into two types. If only evious values of time series are used to pres future values, it is called univariate time prediction. If we use variables other than eries (i.e. exogenous variables) for predict is called multivariable time series predic-Considering our purpose, data acquisition nal results, we choose the former and dee that the information of time series is suffior prediction. The latter can be further disin future studies.
rief Introduction of Baseline Models , we suppose that the target time series is t φ t = 1, 2, . . . , N . Our prediction task is to ate the values of 1 2 , , iew of each baseline model is demonstrated. nal naive forecast: Naive forecast uses the recent value as its prediction; likewise, seanaive forecast anticipate the forecast to be servation on corresponding position of the us time period. The model is defined as the ing: t > 12. While the most simple method, sea-history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks.

SARIMA:
Main effective factors of time series generally include long-term trend (T), seasonal fluctuation (S), cyclical fluctuation (C) and irregular fluctuation (I), in other words, The differential operation can help remove the impact of periodical trends, seasonal variation and multiplicative interaction factor so as to stabilize the time series. As a result, the SARIMA model, integrating seasonal autocorrelation, auto regression model (AR), differencing and moving average model (MA), is another popular forecasting method and has succeeded in many forcasting tasks. In this paper, the model is defined as the following: where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε repre- ters ψ and θ need to be estimated, and order p , q , P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε represents residual sequence, wavelet as the activation function of neural networks. The initial time series is decomposed, predicted and combined to obtain the final predicted value. The application effect of neural networks in empirical research is always inspiring.

Research Framework
Firstly, the baseline models are introduced and the wavelet neural network based on adaptive genetic algorithm is described in detail. Then SARIMA in the baseline model is applied to obtain the autocorrelation order of all time series. The initial data set is transformed into a standard form with input independent variables and output dependent variables. After data pre-processing, all prediction models are simulated, three prediction error measures are calculated and two statistical tests are performed to compare the advantages and disadvantages of each model. Finally, the results are analyzed and the limitations are proposed.
To reiterate, the main purpose of this paper is to prove the effectiveness and superiority of the target algorithm for prediction. Time series prediction can be roughly divided into two types. If only the previous values of time series are used to predict its future values, it is called univariate time series prediction. If we use variables other than time series (i.e. exogenous variables) for prediction, it is called multivariable time series prediction. Considering our purpose, data acquisition and final results, we choose the former and determine that the information of time series is sufficient for prediction. The latter can be further discussed in future studies.

Brief Introduction of Baseline Models
Firstly, we suppose that the target time series is t φ where t = 1, 2, . . . , N . Our prediction task is to calculate the values of 1 2 , , overview of each baseline model is demonstrated.
Seasonal naive forecast: Naive forecast uses the most recent value as its prediction; likewise, seasonal naive forecast anticipate the forecast to be the observation on corresponding position of the previous time period. The model is defined as the following: where t > 12. While the most simple method, seasonal naive forecast is quite robust for highly fluctuant time series because the temporal patterns in history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks.

SARIMA:
Main effective factors of time series generally include long-term trend (T), seasonal fluctuation (S), cyclical fluctuation (C) and irregular fluctuation (I), in other words, The differential operation can help remove the impact of periodical trends, seasonal variation and multiplicative interaction factor so as to stabilize the time series. As a result, the SARIMA model, integrating seasonal autocorrelation, auto regression model (AR), differencing and moving average model (MA), is another popular forecasting method and has succeeded in many forcasting tasks. In this paper, the model is defined as the following: where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε repre- ters ψ and θ need to be estimated, and order p , q , P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: ( ) ( ) where t r and t s represent trend factor and , wavelet as the activation function of neural networks. The initial time series is decomposed, predicted and combined to obtain the final predicted value. The application effect of neural networks in empirical research is always inspiring.

Research Framework
Firstly, the baseline models are introduced and the wavelet neural network based on adaptive genetic algorithm is described in detail. Then SARIMA in the baseline model is applied to obtain the autocorrelation order of all time series. The initial data set is transformed into a standard form with input independent variables and output dependent variables. After data pre-processing, all prediction models are simulated, three prediction error measures are calculated and two statistical tests are performed to compare the advantages and disadvantages of each model. Finally, the results are analyzed and the limitations are proposed.
To reiterate, the main purpose of this paper is to prove the effectiveness and superiority of the target algorithm for prediction. Time series prediction can be roughly divided into two types. If only the previous values of time series are used to predict its future values, it is called univariate time series prediction. If we use variables other than time series (i.e. exogenous variables) for prediction, it is called multivariable time series prediction. Considering our purpose, data acquisition and final results, we choose the former and determine that the information of time series is sufficient for prediction. The latter can be further discussed in future studies.

Brief Introduction of Baseline Models
Firstly, we suppose that the target time series is t φ where t = 1, 2, . . . , N . Our prediction task is to calculate the values of 1 2 , , overview of each baseline model is demonstrated.
Seasonal naive forecast: Naive forecast uses the most recent value as its prediction; likewise, seasonal naive forecast anticipate the forecast to be the observation on corresponding position of the previous time period. The model is defined as the following: where t > 12. While the most simple method, seasonal naive forecast is quite robust for highly fluctuant time series because the temporal patterns in history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks.

SARIMA:
Main effective factors of time series generally include long-term trend (T), seasonal fluctuation (S), cyclical fluctuation (C) and irregular fluctuation (I), in other words, The differential operation can help remove the impact of periodical trends, seasonal variation and multiplicative interaction factor so as to stabilize the time series. As a result, the SARIMA model, integrating seasonal autocorrelation, auto regression model (AR), differencing and moving average model (MA), is another popular forecasting method and has succeeded in many forcasting tasks. In this paper, the model is defined as the following: where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε repre- ters ψ and θ need to be estimated, and order p , q , P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: ( ) ( ) where t r and t s represent trend factor and , wavelet as the activation function of neural networks. The initial time series is decomposed, predicted and combined to obtain the final predicted value. The application effect of neural networks in empirical research is always inspiring.

Research Framework
Firstly, the baseline models are introduced and the wavelet neural network based on adaptive genetic algorithm is described in detail. Then SARIMA in the baseline model is applied to obtain the autocorrelation order of all time series. The initial data set is transformed into a standard form with input independent variables and output dependent variables. After data pre-processing, all prediction models are simulated, three prediction error measures are calculated and two statistical tests are performed to compare the advantages and disadvantages of each model. Finally, the results are analyzed and the limitations are proposed.
To reiterate, the main purpose of this paper is to prove the effectiveness and superiority of the target algorithm for prediction. Time series prediction can be roughly divided into two types. If only the previous values of time series are used to predict its future values, it is called univariate time series prediction. If we use variables other than time series (i.e. exogenous variables) for prediction, it is called multivariable time series prediction. Considering our purpose, data acquisition and final results, we choose the former and determine that the information of time series is sufficient for prediction. The latter can be further discussed in future studies.

Brief Introduction of Baseline Models
Firstly, we suppose that the target time series is t φ where t = 1, 2, . . . , N . Our prediction task is to calculate the values of Seasonal naive forecast: Naive forecast uses the most recent value as its prediction; likewise, seasonal naive forecast anticipate the forecast to be the observation on corresponding position of the previous time period. The model is defined as the following: where t > 12. While the most simple method, seasonal naive forecast is quite robust for highly fluctuant time series because the temporal patterns in history may make no contribut the future. Therefore, seasonal n is the most common standard f tasks.

SARIMA:
Main effective factor ries generally include long-ter seasonal fluctuation (S), cyclic (C) and irregular fluctuation words, Th operation can help remove the riodical trends, seasonal variatio plicative interaction factor so a the time series. As a result, model, integrating seasonal au auto regression model (AR), dif moving average model (MA) popular forecasting method a ceeded in many forcasting task per, the model ( ) , , SARIMA p d q defined as the following: where t r and t s represent tren et as the activation function of neural net-. The initial time series is decomposed, preand combined to obtain the final predicted . The application effect of neural networks in ical research is always inspiring.

ethodologies esearch Framework
, the baseline models are introduced and the et neural network based on adaptive genetic thm is described in detail. Then SARIMA in seline model is applied to obtain the autoation order of all time series. The initial data transformed into a standard form with input endent variables and output dependent var-. After data pre-processing, all prediction ls are simulated, three prediction error res are calculated and two statistical tests erformed to compare the advantages and vantages of each model. Finally, the results alyzed and the limitations are proposed.
iterate, the main purpose of this paper is to the effectiveness and superiority of the targorithm for prediction. Time series predican be roughly divided into two types. If only evious values of time series are used to pres future values, it is called univariate time prediction. If we use variables other than series (i.e. exogenous variables) for predicit is called multivariable time series predic-Considering our purpose, data acquisition inal results, we choose the former and dee that the information of time series is suffifor prediction. The latter can be further disin future studies.

rief Introduction of Baseline Models
, we suppose that the target time series is t φ t = 1, 2, . . . , N . Our prediction task is to ate the values of 1 2 , , iew of each baseline model is demonstrated. nal naive forecast: Naive forecast uses the recent value as its prediction; likewise, seanaive forecast anticipate the forecast to be servation on corresponding position of the us time period. The model is defined as the ing: t > 12. While the most simple method, seanaive forecast is quite robust for highly fluctime series because the temporal patterns in history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks. . The differential operation can help remove the impact of periodical trends, seasonal variation and multiplicative interaction factor so as to stabilize the time series. As a result, the SARIMA model, integrating seasonal autocorrelation, auto regression model (AR), differencing and moving average model (MA), is another popular forecasting method and has succeeded in many forcasting tasks. In this paper, the model is defined as the following: where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε repre- ters ψ and θ need to be estimated, and order p , q , P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: ( ) ( ) where t r and t s represent trend factor and , wavelet as the activation function of neural networks. The initial time series is decomposed, predicted and combined to obtain the final predicted value. The application effect of neural networks in empirical research is always inspiring.

Research Framework
Firstly, the baseline models are introduced and the wavelet neural network based on adaptive genetic algorithm is described in detail. Then SARIMA in the baseline model is applied to obtain the autocorrelation order of all time series. The initial data set is transformed into a standard form with input independent variables and output dependent variables. After data pre-processing, all prediction models are simulated, three prediction error measures are calculated and two statistical tests are performed to compare the advantages and disadvantages of each model. Finally, the results are analyzed and the limitations are proposed.
To reiterate, the main purpose of this paper is to prove the effectiveness and superiority of the target algorithm for prediction. Time series prediction can be roughly divided into two types. If only the previous values of time series are used to predict its future values, it is called univariate time series prediction. If we use variables other than time series (i.e. exogenous variables) for prediction, it is called multivariable time series prediction. Considering our purpose, data acquisition and final results, we choose the former and determine that the information of time series is sufficient for prediction. The latter can be further discussed in future studies.

Brief Introduction of Baseline Models
Firstly, we suppose that the target time series is t φ where t = 1, 2, . . . , N . Our prediction task is to calculate the values of Seasonal naive forecast: Naive forecast uses the most recent value as its prediction; likewise, seasonal naive forecast anticipate the forecast to be the observation on corresponding position of the previous time period. The model is defined as the following: where t > 12. While the most simple method, seasonal naive forecast is quite robust for highly fluctuant time series because the temporal patterns in history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks. . The differential operation can help remove the impact of periodical trends, seasonal variation and multiplicative interaction factor so as to stabilize the time series. As a result, the SARIMA model, integrating seasonal autocorrelation, auto regression model (AR), differencing and moving average model (MA), is another popular forecasting method and has succeeded in many forcasting tasks. In this paper, the model

SARIMA
is defined as the following: where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε repre- ters ψ and θ need to be estimated, and order p , q , P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: ( ) ( ) where t r and t s represent trend factor and , parameters ψ and θ need to be estimated, and order p, q, P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: wavelet as the activation function of neural networks. The initial time series is decomposed, predicted and combined to obtain the final predicted value. The application effect of neural networks in empirical research is always inspiring.

Research Framework
Firstly, the baseline models are introduced and the wavelet neural network based on adaptive genetic algorithm is described in detail. Then SARIMA in the baseline model is applied to obtain the autocorrelation order of all time series. The initial data set is transformed into a standard form with input independent variables and output dependent variables. After data pre-processing, all prediction models are simulated, three prediction error measures are calculated and two statistical tests are performed to compare the advantages and disadvantages of each model. Finally, the results are analyzed and the limitations are proposed.
To reiterate, the main purpose of this paper is to prove the effectiveness and superiority of the target algorithm for prediction. Time series prediction can be roughly divided into two types. If only the previous values of time series are used to predict its future values, it is called univariate time series prediction. If we use variables other than time series (i.e. exogenous variables) for prediction, it is called multivariable time series prediction. Considering our purpose, data acquisition and final results, we choose the former and determine that the information of time series is sufficient for prediction. The latter can be further discussed in future studies.

Brief Introduction of Baseline Models
Firstly, we suppose that the target time series is t φ where t = 1, 2, . . . , N . Our prediction task is to calculate the values of 1 2 , , Seasonal naive forecast: Naive forecast uses the most recent value as its prediction; likewise, seasonal naive forecast anticipate the forecast to be the observation on corresponding position of the previous time period. The model is defined as the following: where t > 12. While the most simple method, seasonal naive forecast is quite robust for highly fluctuant time series because the temporal patterns in history may make no contribution to predict the future. Therefore, seasonal naive forecast is the most common standard for prediction tasks.

SARIMA:
Main effective factors of time series generally include long-term trend (T), seasonal fluctuation (S), cyclical fluctuation (C) and irregular fluctuation (I), in other words, The differential operation can help remove the impact of periodical trends, seasonal variation and multiplicative interaction factor so as to stabilize the time series. As a result, the SARIMA model, integrating seasonal autocorrelation, auto regression model (AR), differencing and moving average model (MA), is another popular forecasting method and has succeeded in many forcasting tasks. In this paper, the model is defined as the following: where ∇ is differential operator, B is delay operator, D t π ∇ φ represents the time series after D orders differencing with π steps, ε repre- ters ψ and θ need to be estimated, and order p , q , P and Q need to be optimized. The value of seasonal cycle π generally equal one year.

Holt-Winters:
Basing on exponential smoothing with two parameters presented by Holt, Winter had made improvement and constructed Holt-Winters exponential smoothing with three parameters in order to smooth linear trend and seasonal variation. The multiplicative seasonality model is defined as the following: where t r and t s represent trend factor and where t r and t s represent trend factor and season factor, respectively, whose initial value can be determined through several ways. α, β and γ all range from 0 to 1. The smoothing parameters are constantly iterated to adapt non-stationary time series and to predict short-term future.

Support Vector Regression (SVR)
: SVR belongs to machine learning algorithms. Given sample data , n n D X y X y X y = season factor, respectively, whose initial value can be determined through several ways. α, β and γ all range from 0 to 1. The smoothing parameters are constantly iterated to adapt non-stationary time series and to predict short-term future. Support vector regression (SVR): SVR belongs to machine learning algorithms. Given sample data which makes the maximum margin between sample points and the hyperplane less than upper bound ε . The primal problem could be modeled as the following: with the constraints defined as,

Lagrange multipliers
problem with constraints is transformed into the primal problem without constraints. On the other hand, this convex optimization problem meet KKT conditions, so the primal problem without constraints can be further transformed into the dual problem. The dual problem could be modeled as the following: , , , , ,  , SVR should learn a function season factor, respectively, whose initial value can be determined through several ways. α, β and γ all range from 0 to 1. The smoothing parameters are constantly iterated to adapt non-stationary time series and to predict short-term future. Support vector regression (SVR): SVR belongs to machine learning algorithms. Given sample data which makes the maximum margin between sample points and the hyperplane less than upper bound ε . The primal problem could be modeled as the following: with the constraints defined as,

Lagrange multipliers
problem with constraints is transformed into the primal problem without constraints. On the other hand, this convex optimization problem meet KKT conditions, so the primal problem without constraints can be further transformed into the dual problem. The dual problem could be modeled as the following: , , , , , If introducing the kernel trick, SVR will transform the input space X into higher dimensional space Z through nonlinear function . The kernel SVR model could be defined as the following: where the kernel function Random forest: Random forest is a new machine learning algorithm with high flexibility, which has been widely used in various classification and regression fields. Random forest adopts the Bagging idea to aggregate a series of decision trees into a forest. To be more specific, random forest constructs multiple independent decision trees, each tree being considered as an evaluator. Then, after obtaining samples, decision tree divides the input space into disjoint areas by the way where the information gain or Gini coefficient decreases fastest. At last, the number of forecast is as same as the number of decision trees and average forecast is regarded as the final output. Random forest algorithm can not only effectively reduce the over-fitting situation, but also realize the parallel operation when there is a large amount of training data.

BP neural networks: BP neural networks
have not only relatively simple network structure and high computational accuracy, but also efficient non-linear mapping ability. Through the activation function, the hidden layer nodes transmit the value of input layer after nonlinear transformation to the output layer. Then we can obtain the output through the calculation of another activation function.
The loss function is defined as the difference between the output value and the actual value. Once the information propagates from the input layer to the output layer, the loss can be computed. BP neural network learns the weights w and biases b of all layers in a supervised way, which means the back propagation algorithm is used to optimize the weights w and biases b of all layers. The whole process is constantly iterated to make which makes the maximum margin between sample points and the hyperplane less than upper bound ε . The primal problem could be modeled as the following: season factor, respectively, whose initial value can be determined through several ways. α, β and γ all range from 0 to 1. The smoothing parameters are constantly iterated to adapt non-stationary time series and to predict short-term future. Support vector regression (SVR): SVR belongs to machine learning algorithms. Given sample data which makes the maximum margin between sample points and the hyperplane less than upper bound ε . The primal problem could be modeled as the following: with the constraints defined as, problem with constraints is transformed into the primal problem without constraints. On the other hand, this convex optimization problem meet KKT conditions, so the primal problem without constraints can be further transformed into the dual problem. The dual problem could be modeled as the following: , , , , , If introducing the kernel trick, SVR will transform the input space X into higher dimensional space Z through nonlinear function . The kernel SVR model could be defined as the following: where the kernel function Random forest: Random forest is a new machine learning algorithm with high flexibility, which has been widely used in various classification and regression fields. Random forest adopts the Bagging idea to aggregate a series of decision trees into a forest. To be more specific, random forest constructs multiple independent decision trees, each tree being considered as an evaluator. Then, after obtaining samples, decision tree divides the input space into disjoint areas by the way where the information gain or Gini coefficient decreases fastest. At last, the number of forecast is as same as the number of decision trees and average forecast is regarded as the final output. Random forest algorithm can not only effectively reduce the over-fitting situation, but also realize the parallel operation when there is a large amount of training data.
BP neural networks: BP neural networks have not only relatively simple network structure and high computational accuracy, but also efficient non-linear mapping ability. Through the activation function, the hidden layer nodes transmit the value of input layer after nonlinear transformation to the output layer. Then we can obtain the output through the calculation of another activation function.
The loss function is defined as the difference between the output value and the actual value. Once the information propagates from the input layer to the output layer, the loss can be computed. BP neural network learns the weights w and biases b of all layers in a supervised way, which means the back propagation algorithm is used to optimize the weights w and biases b of all layers. The whole process is constantly iterated to make , (4) with the constraints defined as, season factor, respectively, whose initial value can be determined through several ways. α, β and γ all range from 0 to 1. The smoothing parameters are constantly iterated to adapt non-stationary time series and to predict short-term future.
Support vector regression (SVR): SVR belongs to machine learning algorithms. Given sample data which makes the maximum margin between sample points and the hyperplane less than upper bound ε . The primal problem could be modeled as the following: with the constraints defined as, problem with constraints is transformed into the primal problem without constraints. On the other hand, this convex optimization problem meet KKT conditions, so the primal problem without constraints can be further transformed into the dual problem. The dual problem could be modeled as the following: , , , , , If introducing the kernel trick, SVR will transform the input space X into higher dimensional space Z through nonlinear function . The kernel SVR model could be defined as the following: where the kernel function Random forest: Random forest is a new machine learning algorithm with high flexibility, which has been widely used in various classification and regression fields. Random forest adopts the Bagging idea to aggregate a series of decision trees into a forest. To be more specific, random forest constructs multiple independent decision trees, each tree being considered as an evaluator. Then, after obtaining samples, decision tree divides the input space into disjoint areas by the way where the information gain or Gini coefficient decreases fastest. At last, the number of forecast is as same as the number of decision trees and average forecast is regarded as the final output. Random forest algorithm can not only effectively reduce the over-fitting situation, but also realize the parallel operation when there is a large amount of training data.

BP neural networks: BP neural networks
have not only relatively simple network structure and high computational accuracy, but also efficient non-linear mapping ability. Through the activation function, the hidden layer nodes transmit the value of input layer after nonlinear transformation to the output layer. Then we can obtain the output through the calculation of another activation function.
The loss function is defined as the difference between the output value and the actual value. Once the information propagates from the input layer to the output layer, the loss can be computed. BP neural network learns the weights w and biases b of all layers in a supervised way, which means the back propagation algorithm is used to optimize the weights w and biases b of all layers. The whole process is constantly iterated to make , where U i ξ , L i ξ denote slack variables and C denotes regularization constant. With regard to optimization parameters ω and b , by introducing Lagrange multipliers λ U , λ L , μ U , μ L , the primal problem with constraints is transformed into the primal problem without constraints. On the other hand, this convex optimization problem meet KKT conditions, so the primal problem without constraints can be further transformed into the dual problem. The dual problem could be modeled as the following: , , , , , If introducing the kernel trick, SVR will transform the input space X into higher dimensional space Z through nonlinear function ( ) X ψ . The kernel SVR model could be defined as the following: where the kernel function Random forest: Random forest is a new machine learning algorithm with high flexibility, which has been widely used in various classification and regression fields. Random forest adopts the Bagging idea to aggregate a series of decision trees into a forest. To be more specific, random forest constructs multiple independent decision trees, each tree being considered as an evaluator. Then, after obtaining samples, decision tree divides the input space into disjoint areas by the way where the information gain or Gini coefficient decreases fastest. At last, the number of forecast is as same as the number of decision trees and average forecast is regarded as the final output. Random forest algorithm can not only effectively reduce the over-fitting situation, but also realize the parallel operation when there is a large amount of training data.
BP neural networks: BP neural networks have not only relatively simple network structure and high computational accuracy, but also efficient non-linear mapping ability. Through the activation function, the hidden layer nodes transmit the value of input layer after nonlinear transformation to the output layer. Then we can obtain the output through the calculation of another activation function. The loss function is defined as the difference between the output value and the actual value. Once the information propagates from the input layer to the output layer, the loss can be computed. BP neural network learns the weights w and biases b of all layers in a supervised way, which means the back propagation algorithm is used to optimize the weights w and biases b of all layers. The whole process is constantly iterated to make the loss achieve gradient descent.

Wavelet Neural Networks Based on Adaptive Genetic Algorithm
Wavelet neural networks (WNN) is an improved version of BP neural networks, which replaces hidden layer's Sigmoid activation function with basic wavelet function. WNN combines wavelet transform's excellent character-time precision in high frequency domain and frequency precision in low frequency domain with BP neural network's self-learning and self-adaptation advantages. Therefore, WNN has strong approximation and fault-tolerance ability.
With WNN being applied wider, its defects appear. For example, the number of hidden layer nodes is difficult to determine and initial values of parameters have a great impact on network performance. Genetic algorithm (GA) is a globally random search algorithm referring to natural evolutionary mechanism. It is suitable for dealing with complex nonlinear optimization problems that are difficult to be solved by traditional search algorithms. Therefore, the hybrid prediction model combining WNN and GA is presented to search a globally near-optimal combination of network pa-rameters, which can improves the accuracy of WNN. [22,29,34,35] have attained preferable results in training neural networks with GA, but there are few studies on WNN with adaptive GA in shipping market.

Wavelet Transform
The essence of continuous wavelet transform is an integral transform between different parameter spaces.
It is a conjugate function ( ) t ψ after different translations b and scales a that does inner product with the sequence ( ) f t waiting to be analyzed [16].
the loss achieve gradient descent.

Wavelet Neural Networks Based on Adaptive Genetic Algorithm
Wavelet neural networks (WNN) is an improved version of BP neural networks, which replaces hidden layer's Sigmoid activation function with basic wavelet function. WNN combines wavelet transform's excellent character-time precision in high frequency domain and frequency precision in low frequency domain with BP neural network's self-learning and self-adaptation advantages. Therefore, WNN has strong approximation and fault-tolerance ability.
With WNN being applied wider, its defects appear. For example, the number of hidden layer nodes is difficult to determine and initial values of parameters have a great impact on network performance. Genetic algorithm (GA) is a globally random search algorithm referring to natural evolutionary mechanism. It is suitable for dealing with complex nonlinear optimization problems that are difficult to be solved by traditional search algorithms. Therefore, the hybrid prediction model combining WNN and GA is presented to search a globally near-optimal combination of network parameters, which can improves the accuracy of WNN. [22,29,34,35] have attained preferable results in training neural networks with GA, but there are few studies on WNN with adaptive GA in shipping market.

Wavelet Transform
The essence of continuous wavelet transform is an integral transform between different parameter spaces. It is a conjugate function ( ) t ψ after different translations b and scales a that does inner product with the sequence ( ) f t waiting to be analyzed [16].
where ( ) t ψ is a fundamental wavelet function and Equation (7) should meet the following restrictions: The formula of inverse wavelet transform is calculated as follows: In practice, discrete wavelet transform can usually get better results. Equation (8) is discretized as follows: where K represents the number of basis wavelet functions. The original time series can be fitted by linear superposition of weighted basis wavelet functions.

Wavelet Neural Networks
Wavelet transform and neural networks can be combined in two ways.

Loose type:
The original data is processed by wavelet transform to extract the feature vectors which are targeted at the input of neural network, and then we carry out the whole process of neural network.  where ( ) t ψ is a fundamental wavelet function and Equation (7) should meet the following restrictions: the loss achieve gradient descent.

Wavelet Neural Networks Based on Adaptive Genetic Algorithm
Wavelet neural networks (WNN) is an improved version of BP neural networks, which replaces hidden layer's Sigmoid activation function with basic wavelet function. WNN combines wavelet transform's excellent character-time precision in high frequency domain and frequency precision in low frequency domain with BP neural network's self-learning and self-adaptation advantages. Therefore, WNN has strong approximation and fault-tolerance ability.
With WNN being applied wider, its defects appear. For example, the number of hidden layer nodes is difficult to determine and initial values of parameters have a great impact on network performance. Genetic algorithm (GA) is a globally random search algorithm referring to natural evolutionary mechanism. It is suitable for dealing with complex nonlinear optimization problems that are difficult to be solved by traditional search algorithms. Therefore, the hybrid prediction model combining WNN and GA is presented to search a globally near-optimal combination of network parameters, which can improves the accuracy of WNN. [22,29,34,35] have attained preferable results in training neural networks with GA, but there are few studies on WNN with adaptive GA in shipping market.

Wavelet Transform
The essence of continuous wavelet transform is an integral transform between different parameter spaces. It is a conjugate function ( ) t ψ after different translations b and scales a that does inner product with the sequence ( ) f t waiting to be analyzed [16].
where ( ) t ψ is a fundamental wavelet function and Equation (7) should meet the following restrictions: The formula of inverse wavelet transform is calculated as follows: In practice, discrete wavelet transform can usually get better results. Equation (8) is discretized as follows: where K represents the number of basis wavelet functions. The original time series can be fitted by linear superposition of weighted basis wavelet functions.

Wavelet Neural Networks
Wavelet transform and neural networks can be combined in two ways.

Loose type:
The original data is processed by wavelet transform to extract the feature vectors which are targeted at the input of neural network, and then we carry out the whole process of neural network. The formula of inverse wavelet transform is calculated as follows: the loss achieve gradient descent.

Wavelet Neural Networks Based on Adaptive Genetic Algorithm
Wavelet neural networks (WNN) is an improved version of BP neural networks, which replaces hidden layer's Sigmoid activation function with basic wavelet function. WNN combines wavelet transform's excellent character-time precision in high frequency domain and frequency precision in low frequency domain with BP neural network's self-learning and self-adaptation advantages. Therefore, WNN has strong approximation and fault-tolerance ability.
With WNN being applied wider, its defects appear. For example, the number of hidden layer nodes is difficult to determine and initial values of parameters have a great impact on network performance. Genetic algorithm (GA) is a globally random search algorithm referring to natural evolutionary mechanism. It is suitable for dealing with complex nonlinear optimization problems that are difficult to be solved by traditional search algorithms. Therefore, the hybrid prediction model combining WNN and GA is presented to search a globally near-optimal combination of network parameters, which can improves the accuracy of WNN. [22,29,34,35] have attained preferable results in training neural networks with GA, but there are few studies on WNN with adaptive GA in shipping market.

Wavelet Transform
The essence of continuous wavelet transform is an integral transform between different parameter spaces. It is a conjugate function ( ) t ψ after different translations b and scales a that does inner product with the sequence ( ) f t waiting to be analyzed [16].
where ( ) t ψ is a fundamental wavelet function and Equation (7) should meet the following restrictions: In practice, discrete wavelet transform can usually get better results. Equation (8) is discretized as follows: where K represents the number of basis wavelet functions. The original time series can be fitted by linear superposition of weighted basis wavelet functions.

Wavelet Neural Networks
Wavelet transform and neural networks can be combined in two ways.

Loose type:
The original data is processed by wavelet transform to extract the feature vectors which are targeted at the input of neural network, and then we carry out the whole process of neural network. ′ . The weight matrix between input layer and hidden layer is denoted as 1 W while the weight matrix between hidden layer and output layer is denoted as 2 W . (8) In practice, discrete wavelet transform can usually get better results. Equation (8)  NN being applied wider, its defects apor example, the number of hidden layer is difficult to determine and initial values of eters have a great impact on network perce. Genetic algorithm (GA) is a globally search algorithm referring to natural evory mechanism. It is suitable for dealing with x nonlinear optimization problems that are lt to be solved by traditional search algo-. Therefore, the hybrid prediction model ing WNN and GA is presented to search a y near-optimal combination of network pars, which can improves the accuracy of [22,29,34,35] have attained preferable ren training neural networks with GA, but re few studies on WNN with adaptive GA ping market.
is a fundamental wavelet function quation (7) should meet the following rens: In practice, discrete wavelet transform can usually get better results. Equation (8) is discretized as follows: where K represents the number of basis wavelet functions. The original time series can be fitted by linear superposition of weighted basis wavelet functions.

Wavelet Neural Networks
Wavelet transform and neural networks can be combined in two ways.

Loose type:
The original data is processed by wavelet transform to extract the feature vectors which are targeted at the input of neural network, and then we carry out the whole process of neural network.  Figure 1.
We assume that sample data includes input [ ] , (9) where K represents the number of basis wavelet functions. The original time series can be fitted by linear superposition of weighted basis wavelet functions.

Wavelet Neural Networks
Wavelet transform and neural networks can be combined in two ways.

Loose type:
The original data is processed by wavelet transform to extract the feature vectors which are targeted at the input of neural network, and then we carry out the whole process of neural network.
Similarly, output layer's input and output are expressed, respectively, as the following: ( ) Similarly, output layer's input and output are expressed, respectively, as the following: The error between the network output Z and the expected output Y is defined as Equation (13): The partial derivatives of the error with respect to the parameters ( ) The iterative formulae of weights, translations and scales are calculated as follows: (12) The error between the network output Z and the expected output Y is defined as Equation (13) imilarly, output layer's input and output are exressed, respectively, as the following: he error between the network output Z and the xpected output Y is defined as Equation (13) ( ) The iterative formulae of weights, translations and scales are calculated as follows: (13) The partial derivatives of the error with respect to the parameters The iterative formulae of weights, translations and scales are calculated as follows: where 1 η and 2 η represent the learning rate of weight parameters and wavelet parameters, respectively.

Adaptive Genetic Algorithm
Genetic algorithm cannot directly deal with the parameters of the problem space. First, the feasible solutions of the problem space need to be ex-  where 1 η and 2 η represent the learning rate of weight parameters and wavelet parameters, respectively.

Adaptive Genetic Algorithm
Genetic algorithm cannot directly deal with the parameters of the problem space. First, the feasible solutions of the problem space need to be expressed ( ) Similarly, output layer's input and output are expressed, respectively, as the following: The error between the network output Z and the expected output Y is defined as Equation (13): The partial derivatives of the error with respect to the parameters ( ) The iterative formulae of weights, translations and scales are calculated as follows:

343
Information Technology and Control 2023/2/52 as chromosomes of the genetic space. Then the initial population is randomly generated. Each individual in the population represents a solution to the problem. The quality of the individual is measured by the fitness function. Through a series of genetic operations, such as selection, recombination, mutation and evolutionary reversal, the father generation produces offspring in which the excellent individuals with high fitness are more likely to be selected to form new population. In the process of iteration, the excellent individual's information is preserved and constantly exchanged to ensure the diversity of the population. Finally, the remaining individuals converge around the optimal solution and the optimal individual is selected as the solution of the problem. The process of WNN based on AGA is shown in Figure 2.
Encode parameters: Before conducting searches, the parameters that need to be encoded include (1) weights  ( 1) where 1 η and 2 η represent the learning rate of weight parameters and wavelet parameters, respectively.

Adaptive Genetic Algorithm
Genetic algorithm cannot directly deal with the parameters of the problem space. First, the feasible solutions of the problem space need to be ex-pressed as chromosomes of the genetic space. Then the initial population is randomly generated. Each individual in the population represents a solution to the problem. The quality of the individual is measured by the fitness function. Through a series of genetic operations, such as selection, recombination, mutation and evolutionary reversal, the father generation produces offspring in which the excellent individuals with high fitness are more likely to be selected to form new population. In the process of iteration, the excellent individual's information is preserved and constantly exchanged to ensure the diversity of the population. Finally, the remaining individuals converge around the optimal solution and the optimal individual is selected as the solution of the problem. The process of WNN based on AGA is shown in Figure 2.

Figure 2
The process of WNN based on AGA Encode parameters: Before conducting searches, the parameters that need to be encoded include (1) weights Given that the number of input layer neurons and output layer neurons are m and n , respectively, the number of hidden layer neurons can be set to and reversal. The purpose of selection is to filter elite individuals from the current population. The individuals with higher fitness scores ( 1 / WNN MSE ) in the father generation have a higher probability to reproduce offspring. This process reflects "the fittest can Genetic operations: There are four genetic operations including selection, recombination, mutation and reversal. The purpose of selection is to filter elite individuals from the current population. The individuals with higher fitness scores (1 / WNN MSE ) in the father generation have a higher probability to reproduce offspring. This process reflects "the fittest can survive" in the biological world. In this paper, roulette-wheel selection is applied to establish the offspring. The probability of an individual i being selected is defined as the following: where i F represents fitness score of the individual i and N is the capacity of parent population.
Both single-point crossover and adaptive mutation are applied to diversify the population so as to find a globally near-optimal solution. Single-point crossover refers that randomly selected two parent members exchange or combine chromosomes at a random position to pass on excellent characteristics to their offspring. For each individual in the offspring, if a random number generated between 0 and 1 is less than the mutation probability m P , binary conversion is performed at its random position. In this study, m P adaptively adjusts according to the heterogeneity of the population (1 / Chrom σ ), where Chrom σ represents the standard deviation of the parent's fitness scores. A small Chrom σ means high homogeneity in the population, so a high mutation probability is applied to diversify the chromosomes. In order to improve the local searching ability of AGA, continuous evolutionary reversal operations i.e. swapping numbers in two random positions, are introduced after selection, recombination and adaptive mutation operation.
Only individuals with improved fitness are accepted. Finally, the inferior members of the parent population are replaced by the elite members of the offspring generation to form a new population. The whole AGA process is iterated repeatedly until the terminal condition is satisfied.
Generally speaking, AGA-WNN has four advantages compared with the baseline models,  (16) where i F represents fitness score of the individual i and N is the capacity of parent population. Both single-point crossover and adaptive mutation are applied to diversify the population so as to find a globally near-optimal solution. Single-point crossover refers that randomly selected two parent members exchange or combine chromosomes at a random position to pass on excellent characteristics to their offspring. For each individual in the offspring, if a random number generated between 0 and 1 is less than the mutation probability P m , binary conversion is performed at its random position. In this study, P m adaptively adjusts according to the heterogeneity of the population (1 / Chrom σ ), where Chrom σ represents the standard deviation of the parent's fitness scores. A small Chrom σ means high homogeneity in the population, so a high mutation probability is applied to diversify the chromosomes. In order to improve the local searching ability of AGA, continuous evolutionary reversal operations i.e. swapping numbers in two random positions, are introduced after selection, recombination and adaptive mutation operation. Only individuals with improved fitness are accepted. Finally, the inferior members of the parent population are replaced by the elite members of the offspring generation to form a new population. The whole AGA process is iterated repeatedly until the terminal condition is satisfied.
Generally speaking, AGA-WNN has four advantages compared with the baseline models, 1 Strong nonlinear approximation, self-learning and self-adaptation ability.
2 Permission to parameter estimation based on a large number of variables.
3 Excellent generalization ability. 4 Fast convergence and inaccessibility to fall into the local optimal trap.

Data Description
All secondhand tanker price data is collected from the Clarkson Shipping Intelligence Network [10]. This paper aims to research the application effect of the hybrid prediction model -wavelet neural networks based on adaptive genetic algorithm in the tanker market. Attributing to the wide distribution of ship sizes of tankers, only the most representative tanker types are selected for empirical study. The predominant tankers selected carry major global crude oil trade, including Handysize, Panamax, Aframax, Suezmax and VLCC. Therefore, the method and its prediction results can effectively interpret the market situation of second-hand tankers. In this section, we give a concise description of each tanker type, followed by descriptive statistics of its datasets.
Handysize tankers refer to small-sized crude oil tankers with a deadweight tonnage ranging between 10,000 and 50,000 tons. It is because of its strong flexibility and unconstrained draft that handysize tankers can play an important role in many fields, for example offshore waters and offshore drilling platforms, forming a complement to large tankers. Its sales have increased gradually in recent years. In this paper, handysize tankers of 37K DWT are selected for empirical study.
Panamax tankers are subject to the navigation conditions of Panama Canal as the upper limits, in other words, they need to meet some restrictions about the ship's width, draft and so on. Generally, several main indicators are limited as follows: the tanker's total length should not be longer than 274.32 meters, the tanker's width should not be wider than 32.30 meters and the tanker's deadweight tonnage should range between 60,000 and 80,000 tons. The main routes include the Far East to Japan, the Far East to India and Singapore to Japan. In this paper, Panamax tankers of 73K DWT are selected for empirical study.
Aframax tankers refer to medium-sized crude oil tankers with a deadweight tonnage ranging between 80,000 and 120,000 tons. However, considering the operating costs, their actual deadweight tonnage generally vary from 70,000 to 110,000 tons, having an average cargo-carrying capacity of approximately 750,000 barrels [12]. Many non-OPEC oil exporting countries have port facilities that are difficult to accommodate VLCC or ULCC, so that there is a high demand for Aframax tankers, owing to their advantageous size. With the highest average freight index (AFRA) and the best economy, Aframax tankers are the ideal choice for short to medium-haul oil trades, which are also known as the "workhorse" of the world's tanker fleet. The monthly data analyzed are based on 10-year-old and 15-year-old secondhand tanker prices for the five tanker types mentioned above. The longest sequence starts in July 1989 and the shortest sequence is from October 2012. All series end on April 2022. The numbers of data points vary from 115 to 394. Table 1 summarizes the detailed statistics. For the average price level, large tankers are higher than small ones and 10YO vessels are higher than 15YO ones. The standard deviation also follows a similar pattern. Most distributions are right-skewed and leptokurtic.
Jarque-Bera statistics can test whether a set of samples is normally distributed. Under the null hypothesis of normal distribution, the JB statistics calculated according to Equation (17)  deadweight tonnage ranging between 120,000 and 200,000 tons. In 2009, the Suez Canal was expanded from 18 meters to 20.1 meters to allow a Suezmax tanker with up to 200,000 deadweight tonnage. The main routes include the Persian Gulf to the Far East, the Caribbean to North America, West Africa to North America, West Africa to the Mediterranean region. In this paper, Suezmax tankers of 150K DWT are selected for empirical study.
Very Large Crude Carriers (VLCC) and Ultra Large Crude Carriers (ULCC) are the main type of long-distance crude oil transportation, compared with other types of tankers with higher performance and economies of scale. VLCC have a deadweight tonnage ranging between 200,000 and 320,000 tons and ULCC have a deadweight tonnage more than 320,000 tons. The main routes are deployed around the Persian Gulf, the Far East, North America, Northern Europe, Indian subcontinent and West Africa. In this paper, VLCC of numbers of data points vary from 115 to 394.
The result demonstrates that the price sequences for all tanker types and ages do not follow a normal distribution at significance level 0.05  Figure 3, the overall pattern of secondhand tanker price series is quite obvious. Especially from 2005 to 2009, the data reflects an unprecedented fluctuation. The factor decomposi-tion of the price series of 10YO secondhand Aframax is taken as an example, as is demonstrated in Figure 4, which is divided into four parts from top to bottom. The first part is the original observations; the second part is the estimated long-term trend which is increasing initially and then decreasing, cyclically, (17) The result demonstrates that the price sequences for all tanker types and ages do not follow a normal distribution at significance level 0.05 α = .

Order Determination
According to Figure 3, the overall pattern of secondhand tanker price series is quite obvious. Especially from 2005 to 2009, the data reflects an unprecedented fluctuation. The factor decomposition of the price series of 10YO secondhand Aframax is taken as an example, as is demonstrated in Figure 4, which is divided into four parts from top to bottom. The first part is the original observations; the second part is the estimated long-term trend which is increasing initially and then decreasing, cyclically, and also shows anomalies in the period 2005-2009; the third part is  patterns of all sequences are highly similar, they are non-stationary time series with seasonal variation and periodic trend, suitable for order determination by SARIMA model.

Figure 3
Time series chart of secondhand tanker prices for various ship types

Figure 4
Decomposition of 10YO's Aframax Table 2 shows the autocorrelation orders and underlying network structures of 10 time series, using the forecast package in R library. Then, machine learning methods can be applied to analyze modified datasets with standard forms of input independent variables and output dependent variables Table 2 Order   Decomposition of 10YO's Aframax Table 2 shows the autocorrelation orders and underlying network structures of 10 time series, using the forecast package in R library. Then, machine learning methods can be applied to analyze modified datasets with standard forms of input independent variables and output dependent variables Table 2 Order determination and underlying structures of all time series the estimated seasonal fluctuation; the last part is the estimated random variation which fluctuates around zero. On the whole, Figure 4 is consistent with the information in the original sequence diagram. Given that the temporal patterns of all sequences are highly similar, they are non-stationary time series with seasonal variation and periodic trend, suitable for order determination by SARIMA model.
forecast package in R library. Then, machine learning methods can be applied to analyze modified datasets with standard forms of input independent variables and output dependent variables

Normalization
In the field of machine learning, normalization is a common basic operation of data pre-processing. A proper normalization can help neural network model produce more accurate results. Its advantages can be embodied in two aspects. On the one hand, in some network structures we set above, independent variables have multiple dimensions and different features may range in different scales. In this case, features with higher absolute level will play a more important role, which even affect the final output. In order to solve this problem, when the relative importance of each feature is unclear, normalization makes each input feature in the same magnitude or similar distribution, which is suitable for comparative evaluation. In the network training process, we can ensure that all features are treated equally (meaning we set the same learning rate, initial weight, and activation function).
On the other hand, in this paper, the training speed of BP neural networks and wavelet neural networks is determined by the speed of gradient descent algorithm. After data normalization, the contour surface of the loss function's probability distribution is ap- Table 2 shows the autocorrelation orders and underlying network structures of 10 time series, using the

Normalization
In the field of machine learning, normalization is a common basic operation of data pre-processing. A proper normalization can help neural network model produce more accurate results. Its advantages can be embodied in two aspects. On the one hand, in some network structures we set above, independent variables have multiple dimensions and different features may range in different scales. In this case, features with higher absolute level will play a more important role, which even affect the final output. In order to solve this problem, when the relative importance of each feature is unclear, normalization makes each input feature in the same magnitude or similar distribution, which is suitable for comparative evaluation.
In the network training process, we can ensure that all features are treated equally (meaning we set the same learning rate, initial weight, and acti-vation function). On the other hand, in this paper, the training speed of BP neural networks and wavelet neural networks is determined by the speed of gradient descent algorithm. After data normalization, the contour surface of the loss function's probability distribution is approximately circular. The gradient descent direction points toward the center of the circle, resulting in faster convergence and fewer iterations.
Hereby, the max-min normalization is put into practice as Equation (18): where max x and min ation function). On the other hand, in this aper, the training speed of BP neural netorks and wavelet neural networks is deterined by the speed of gradient descent algothm. After data normalization, the contour rface of the loss function's probability disibution is approximately circular. The graient descent direction points toward the cenr of the circle, resulting in faster converence and fewer iterations.
ereby, the max-min normalization is put inpractice as Equation (18): here max x and min x represent the maximum d minimum of the original sequence x . normalized denotes the normalized sequence, hich ranges in [0, 1]. , (18) where max x and min x represent the maximum and minimum of the original sequence x . normalized x denotes the normalized sequence, which ranges in [0, 1].

Hyper-parameter Optimization
The sample data of the 10 time series (actually 19) is randomly divided into training set and test set. The optimal hyper-parameters make the validation set separated from the training set get the best performance, that is, the minimum error. Then the test set is retrained based on the optimal hyper-parameters. Hyper-parameters contained in traditional machine learning models are optimized through cross-vali-dation. BP neural networks adopt Levenberg-Marquardt algorithm while wavelet neural networks use adaptive genetic algorithm to optimize hyper-parameters, iteratively searching for the approximately optimal parameter combination until the termination condition is satisfied. The fitting effects of all models are measured by MSE, MAE and MAPE.

Comparison with Baseline Models
The performance of 7 models on 10 time series (19 structures) is compared based on two perspectives. 1 Error measures. Three kinds of prediction error measures are calculated to judge the superiority and inferiority of these prediction models. Common prediction error indicators include mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), which are defined as follows: pirical Results per-parameter Optimization ple data of the 10 time series (actually 19) mly divided into training set and test set. timal hyper-parameters make the validaseparated from the training set get the best ance, that is, the minimum error. Then the is retrained based on the optimal hyperters. Hyper-parameters contained in tradiachine learning models are optimized cross-validation. BP neural networks Levenberg-Marquardt algorithm while neural networks use adaptive genetic alto optimize hyper-parameters, iteratively g for the approximately optimal paramebination until the termination condition is .

Comparison with Baseline Models
The performance of 7 models on 10 time series (19 structures) is compared based on two perspectives.

Error measures.
Three kinds of prediction error measures are calculated to judge the superiority and inferiority of these prediction models. Common prediction error indicators include mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), which are defined as follows: where n is the number of samples in the test set. i y and ˆi y represent raw data and the prediction, respectively. Tables 3-5, respectively. The numbers in bold indicate that the corresponding model performs best on the corresponding time series. For sequences with multiple structures, we select the better one and mark it with a checkmark ("√"). As a result, 10 structures corresponding to 10

MSE, MAE and MAPE of all test sets are demonstrated in
pirical Results per-parameter Optimization ple data of the 10 time series (actually 19) mly divided into training set and test set. timal hyper-parameters make the validaseparated from the training set get the best ance, that is, the minimum error. Then the is retrained based on the optimal hyperters. Hyper-parameters contained in tradiachine learning models are optimized cross-validation. BP neural networks Levenberg-Marquardt algorithm while neural networks use adaptive genetic alto optimize hyper-parameters, iteratively g for the approximately optimal paramebination until the termination condition is .

Comparison with Baseline Models
The performance of 7 models on 10 time series (19 structures) is compared based on two perspectives.

Error measures.
Three kinds of prediction error measures are calculated to judge the superiority and inferiority of these prediction models. Common prediction error indicators include mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), which are defined as follows: where n is the number of samples in the test set. i y and ˆi y represent raw data and the prediction, respectively. Tables 3-5, respectively. The numbers in bold indicate that the corresponding model performs best on the corresponding time series. For sequences with multiple structures, we select the better one and mark it with a checkmark ("√"). As a result, 10 structures corresponding to 10  , (20) pirical Results yper-parameter Optimization ple data of the 10 time series (actually 19) omly divided into training set and test set. timal hyper-parameters make the validaseparated from the training set get the best ance, that is, the minimum error. Then the is retrained based on the optimal hyperters. Hyper-parameters contained in tradimachine learning models are optimized h cross-validation. BP neural networks Levenberg-Marquardt algorithm while t neural networks use adaptive genetic alto optimize hyper-parameters, iteratively ng for the approximately optimal paramebination until the termination condition is d. The fitting effects of all models are ed by MSE, MAE and MAPE.

Comparison with Baseline Models
The performance of 7 models on 10 time series (19 structures) is compared based on two perspectives.

Error measures.
Three kinds of prediction error measures are calculated to judge the superiority and inferiority of these prediction models. Common prediction error indicators include mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), which are defined as follows: where n is the number of samples in the test set. i y and ˆi y represent raw data and the prediction, respectively. Tables 3-5, respectively. The numbers in bold indicate that the corresponding model performs best on the corresponding time series. For sequences with multiple structures, we select the better one and mark it with a checkmark ("√"). As a result, 10 structures corresponding to 10

MSE, MAE and MAPE of all test sets are demonstrated in
where n is the number of samples in the test set. i y and ˆi y represent raw data and the prediction, respectively. Tables 3-5, respectively. The numbers in bold indicate that the corresponding model performs best on the corresponding time series. For sequences with multiple structures, we select the better one and mark it with a checkmark ("√"). As a result, 10 structures corresponding to 10 sequences still remain.     cy is up to about 93 percent higher than that of SARIMA while about 13 percent higher than that of SVR regarding MSE. For time series where AGA-WNN does not rank first, the performance is still competitive and remains at the forefront of sonable to adopt AGA-WNN as the primary prediction model for the secondhand tanker market.

Figure 6
15YO's Suezmax (structure 3): Evolutionary process where N and k denote the number of datasets and algorithms, respectively. i γ is the average rank of the algorithm i . However, F τ is more frequently used.
The above diagrams can also prove that machine learning methods generally prevail over traditional models. Howev-er, we cannot directly determine whether the slight differences among AGA-WNN, SVR and random forest are significant based on three error metrics or Friedman test. Therefore, the Nemenyi test is introduced to further distinguish random two algorithms, providing a critical value ( CD ) of the difference between the average ranks.
( 1) If the difference between the average ranks of two algorithms exceeds The average ranks based on the three error metrics are shown in Table 6, visualized in Figures 9-11. From top to bottom, the models are ranked from worst to best in terms of performance. , where N and k denote the number of datasets and algorithms, respectively. i γ is the average rank of the algorithm i . However, F τ is more frequently used.
The above diagrams can also prove that machine learning methods generally prevail over traditional models. Howev-er, we cannot directly determine whether the slight differences among AGA-WNN, SVR and random forest are significant based on three error metrics or Friedman test. Therefore, the Nemenyi test is introduced to further distinguish random two algorithms, providing a critical value ( CD ) of the difference between the average ranks.
( 1) If the difference between the average ranks of two algorithms exceeds The average ranks based on the three error metrics are shown in Table 6, visualized in Figures 9-11. From top to bottom, the models are ranked from worst to best in terms of performance. , (23) where N and k denote the number of datasets and algorithms, respectively. i γ is the average rank of the algorithm i.
where N and k denote the number of datasets and algorithms, respectively. i γ is the average rank of the algorithm i . However, F τ is more frequently used.
The above diagrams can also prove that machine learning methods generally prevail over traditional models. Howev-er, we cannot directly determine whether the slight differences among AGA-WNN, SVR and random forest are significant based on three error metrics or Friedman test. Therefore, the Nemenyi test is introduced to further distinguish random two algorithms, providing a critical value ( CD ) of the difference between the average ranks.
( 1) If the difference between the average ranks of two algorithms exceeds . The average ranks based on the three error metrics are shown in Table 6, visualized in Figures 9-11. From top to bottom, the models are ranked from worst to best in terms of performance. When both N and k are large, τ χ2 obeys the 2 χ distribution with (k -1) degrees of freedom. However, F τ is more frequently used. . The above diagrams can also prove that machine learning methods generally prevail over traditional models. However, we cannot directly determine whether the slight differences among AGA-WNN, SVR and random forest are significant based on three error metrics or Friedman test. Therefore, the Nemenyi test is introduced to further distinguish random two algorithms, providing a critical value ( CD ) of the difference between the average ranks.
The above diagrams prove that machine learning methods prevail over traditional models. Howev-er, we cannot directly determine whether the slight differences among AGA-WNN, SVR and random forest are significant based on three error metrics or Friedman test. Therefore, the Nemenyi test is introduced to further distinguish random two algorithms, providing a critical value ( CD ) of the difference between the average ranks.
( 1) If the difference between the average ranks of two algorithms exceeds  (24) If the difference between the average ranks of two algorithms exceeds ( 1) ( 1, ( 1)( 1)) ( 1) where N and k denote the number of datasets and algorithms, respectively. i γ is the average rank of the algorithm i . However, F τ is more frequently used.
The above diagrams can also prove that machine learning methods generally prevail over traditional models. Howev-er, we cannot directly determine whether the slight differences among AGA-WNN, SVR and random forest are significant based on three error metrics or Friedman test. Therefore, the Nemenyi test is introduced to further distinguish random two algorithms, providing a critical value ( CD ) of the difference between the average ranks.
( 1) If the difference between the average ranks of two algorithms exceeds . The average ranks based on the three error metrics are shown in Table 6, visualized in Figures 9-11. From top to bottom, the models are ranked from worst to best in terms of performance.  Friedman test diagram based on MSE , the null hypothesis that two algorithms have same performance will be rejected at the corresponding significance level 0.05 α = . The average ranks based on the three error metrics are shown in Table 6, visualized in Figures 9-11. From top to bottom, the models are ranked from worst to best in terms of performance.
The results of the Nemenyi test show that although AGA-WNN and random forest are the top two models, they are not significantly different from SVR, SA-RIMA and Holt-Winter. The NFL Theorem (No Free Lunch Theorem) can explain that the expected performance of all algorithms is same, in other words, one algorithm will not perform best in all problems. Therefore, for several algorithms with similar accuracy on certain datasets, error variance becomes one of the most important factors. The error variance of AGA-WNN is smaller than that of other models. As a result, AGA-WNN is recommended to predict secondhand tanker price because of its robustness.

Figure 10
Nemenyi test diagram based on MAE

Figure 11
Nemenyi test diagram based on MAPE

Discussion and Conclusion
The secondhand ship market offers shipowners and investors the chance to buy and sell ships directly, making it easier for them to enter or exit the freight market. Therefore, the key lies in the time of investment. Asset transactions in which buy low and sell high can generate considerable profits. High freight rate tend to be accompanied by high ship value. While this is a piece of bad news for new investors in terms of raising costs, it provides an opportunity for shipowners to make money, whether operating or selling their ships. It is essential to make accurate predictions driven by data.
In this paper, a hybrid prediction model composed of wavelet neural networks and adaptive genetic algorithm is proposed to forcast the secondhand tanker price for different ship types and ship ages. A series of time series datasets include 10 year-old and 15 year-old Handysize, Panamax, Aframax, Suezmax and VLCC/ULCC. The proposed hybrid model is simulated and compared with common machine learning algorithms and traditional prediction models. Three kinds of prediction error metrics are calculated, and two kinds of post-hoc statistical tests based on the error are performed, in order to judge the overall performance of AGA-WNN. The results demonstrate that AGA-WNN is the best model compared with the baseline models, with comparative superiority in accuracy and robustness. Therefore, AGA-WNN can be considered as an applicable data-driven tool to help relevant stakeholders in the shipping market monitor market trends in time, make reasonable management or decisions, and avoid unnecessary loss caused by subjective judgment.
Although the hybrid prediction model has achieved remarkable success in multiple time series, it still has some limitations. Firstly, this paper mainly studies the prediction of nonlinear models, which have higher time complexity and slower running speed, compared with linear models. It is necessary and challenging to develop new parameter efficiently iterative algorithms in neural networks. Secondly, in addition to the five types of tankers investigated in the empirical study, we will also study the performance of AGA-WNN on other types of ships in the future, for example, dry bulk and container market. Furthermore, in terms of data frequency, this paper researches the

Figure 10
Friedman test diagram based on MAE

Figure 11
Friedman test diagram based on MAPE The results of the Nemenyi test show that although AGA-WNN and random forest are the top two models, they are not significantly different from SVR, SARIMA and Holt-Winter. The NFL Theorem (No Free Lunch Theorem) can explain that the expected performance of all algorithms is same, in other words, one algorithm will not perform best in all problems. Therefore, for several algorithms with similar accuracy on certain datasets, error variance becomes one of the most important factors. The error variance of AGA-WNN is smaller than that of other models. As a result, AGA-WNN is recommended to predict secondhand tanker price because of its robustness.

Discussion and Conclusion
The secondhand ship market offers shipowners and investors the chance to buy and sell ships di-amax, Aframax, Suezmax and VLCC/ULCC. The proposed hybrid model is simulated and compared with common machine learning algorithms and traditional prediction models. Three kinds of prediction error metrics are calculated, and two kinds of post-hoc statistical tests based on the error are performed, in order to judge the overall performance of AGA-WNN. The results demonstrate that AGA-WNN is the best model compared with the baseline models, with comparative superiority in accuracy and robustness. Therefore, AGA-WNN can be considered as an applicable data-driven tool to help relevant stakeholders in the shipping market monitor market trends in time, make reasonable management or decisions, and avoid unnecessary loss caused by subjective judgment.
Although the hybrid prediction model has achieved remarkable success in multiple time series, it still has some limitations. Firstly, this paper mainly studies the prediction of non-

Figure 11
Friedman test diagram based on MAPE alte top erent NFL plain amax, Aframax, Suezmax and VLCC/ULCC. The proposed hybrid model is simulated and compared with common machine learning algorithms and traditional prediction models. Three kinds of prediction error metrics are 355 Information Technology and Control 2023/2/52 low-frequency monthly data. In regards to variable dimension, only the information of time series itself is extracted for prediction. In the future, we can aim at the high-frequency data, incorporating more exogenous variables that affect decision-making, for example, newbuilding ship price, time charter rates, scrap value and so on, to develop a predictive model with strong robustness. Last but not least, this article only adopts AGA to solve WNN's shortcomings regarding slow convergence and accessibility to fall into the local optimal trap. We can also combine WNN and other intelligent search algorithms, such as particle swarm optimization algorithm [7].