A Prioritization Approach for Regression Test Cases Based on a Revised Genetic Algorithm

The regression testing is a software-based testing approach executed to verify that changes made to the software do not affect the existing functionality of the product. On account of the constraints of time and cost, it is impractical to re-execute all the test cases for software whenever a change occurs. In order to overcome such a problem in the


Introduction
Software maintenance is considered as a crucial process in the cycle of software development. Often, two-thirds of software development cost is allocated to software maintenance (Pressman, 2005). The software maintenance is frequently carried out to correct errors, to append a new function or improve an existing function of software or to adapt it to new software or hardware (Biswas et al., 2011). Whenever a maintenance activity is executed, the regression test is carried out to verify that the modified parts work correctly and meet the software specifications. Thus, the software testing that includes the regression test makes software robust, more effective and trustworthy.
However, the regression test is still a hard task due to constraints of time and cost. Retesting a software by using the complete range of test cases is expensive and inefficient (Sur et al., 2019, Harikarthik et al., 2019, Yoo and Harman, 2012, Engström and Runeson, 2010. Thus, a prioritization technique should be employed to facilitate the regression test process. Many methods have been proposed in the regression testing literature to overcome such problems. Although the regression test is carried out repeatedly throughout the cycle of software development (Konsaard and Ramingwong, 2015, Ekelund and Engström 2015, Kavitha and Sureshkumar, 2010), most of such techniques are code-based techniques which prove to be useful in unit testing but not in case of functional testing and face the scalability issues with respect to big and complicated software systems (Panda et al., 2019, Sapna andBalakrishnan, 2015). Thus, generation and prioritization of regression test cases from software specifications could be considered as an optimization in which meta-heuristic methods could be employed one of such methods is Genetic Algorithm (GA).
Genetic algorithm is a robust algorithm which is employed to overcome the optimization problems and it is based on the theory of natural selection and percepts of evolutionary biology (Guo, 2020;Yang et al., 2016). This algorithm is being used in the computing areas especially in the software testing because of its efficiency in providing a right solution for complicated, discrete and nonlinear issues produced by a complicated software (Dey et al., 2020;Vinitha and Preetha, 2018). It could be applied to reduce the effort and cost by creating test cases automatically and thereby Software maintenance is considered as a crucial process in the cycle of software development. Often, two-thirds of software development cost is allocated to software maintenance [29]. The software maintenance is frequently carried out to correct errors, to append a new function or improve an existing function of software or to adapt it to new software or hardware [5]. Whenever a maintenance activity is executed, the regression test is carried out to verify that the modified parts work correctly and meet the software specifications. Thus, the software testing that includes the regression test makes software robust, more effective and trustworthy.
However, the regression test is still a hard task due to constraints of time and cost. Retesting software by using the complete range of test cases is expensive and inefficient [13, 16,34,39]. Thus, a prioritization technique should be employed to facilitate the regression test process. Many methods have been proposed in the regression testing literature to overcome such problems. Although the regression test is carried out 445 Information Technology and Control 2021/3/50 repeatedly throughout the cycle of software development [12,17,21], most of such techniques are codebased techniques which prove to be useful in unit testing but not in case of functional testing and face the scalability issues with respect to big and complicated software systems [5,28,31]. Thus, the generation and prioritization of regression test cases from software specifications could be considered as an optimization in which meta-heuristic methods could be employed one of such methods is Genetic Algorithm (GA).
Genetic algorithm is a robust algorithm which is employed to overcome optimization problems and it is based on the theory of natural selection and percepts of evolutionary biology [15,38]. This algorithm is being used in the computing areas especially in the software testing because of its efficiency in providing a right solution for complicated, discrete and nonlinear issues produced by a complicated software [11,36]. It could be applied to reduce the effort and cost by creating test cases automatically and thereby significantly enhances the software testing efficiency. Despite that, the most challenging obstacle that could be encountered in the application of the genetic algorithm is that it could be trapped in the optimal local solution which leads to population ageing [11,16,34,38]. In order to overcome this problem, various methods have been proposed to improve or adjust some factors such as parameter settings, fitness function, genetic operations, and chromosome population [38]. However, the proposed methods have some difficulties that make them unattractive to the software testing including difficulty implementing these methods without extra effort because of the highly complex nature of the enhancements [38]. Therefore, in this research work, a Revised Genetic Algorithm has been proposed to solve the problem of optimal local solution easily and effectively, which is meant to be used for prioritizing the test cases that are generated from the software specifications.
On the other hand, the Unified Modeling Language (UML) is the most popular standards for modeling the software specifications and software design. It includes various models to support software systems development with an object-oriented approach. These models include use case diagram, use case description, sequence diagram, class diagram, activity diagram and state diagram to model both dynamic and static behavior of software systems at different levels of abstraction [10,37,6]. As this study includes functionality level, the activity diagram is employed. Activity diagrams are used to elaborate the scenario related to each use case (functionality) in the software systems. It involves the main, alternate and exception scenarios that deal with the functionality. Thus, the number of test cases that are generated from use case description models by using an automated approach are exhaustive [26]. However, there are relatively few particular techniques proposed to generate test cases from models in analysis and design phases especially the activity diagram [1,3,4]. Moreover, extract information from the activity diagram is a complicated task because of the activity diagram provides concepts at a higher abstraction level of a system [35].
A regression test selection is carried out to ensure that developed functionality, both existing and modified, work appropriately by selecting the only subset of test cases that were developed initially to test the software. The regression test selection problem has been introduced as follows: let P be a software program and P ̃ a revised version of P. Similarly, let T a test suite that had been developed initially to test a software program during the software development phase. The regression test selection techniques aim to select a sub-set of valid test cases from an initial test suite (T ̃ ⊆ T) to test that the existing and modified parts of P continue to work properly such that every error is detected when P ĩs executed with T.
The regression test selection aims to select a subset of test cases to be used to test the functionality of the software without affecting the software quality [33,23]. In this research work, the selected test cases will be considered as a regression test. This regression test will use sub-set of the test cases to verify the functionality of the software with parts of software that have been changed. The premise is that selecting minimum test cases to verify software functionality with respect to changes before elaborately testing the functionality to ensure that all selected test cases are more effective. The regression test selection helps in two ways: it is used to ensure that the functionality that passes the initial test cases is tested further to make sure that no new defects have been produced by the changes in the previously validated functionality; otherwise, the changes that have been done on the functionality are considered void.
Although many research works have proposed different techniques to generate regression test cases, a great generated test cases. The experimental results of this research work showed that the proposed approach was efficient and effective in offering a near-optimal test case and high test coverage in the early stage of software development. Despite that, this research work could not solve the main problem of the genetic algorithm which could significantly harm the benefits of the genetic algorithm in the software testing area and increase the effort and cost of the test cases generation. Additionally, the proposed technique in this work did not prioritize the test cases and only provided the optimal path.
Another research work was done for the test suite optimization by Sahoo et al. [30]. In this work, the authors proposed an approach to generate the test cases from state chart and sequence diagrams. The proposed approach converted the sequence diagram and state chart diagram into the sequence graph and state chart graph, respectively. In order to create a system graph, both graphs were combined and from the system graph, different test cases have been discovered and optimized by applying an evolutionary algorithm called hybrid bee colony algorithm. The results of this research showed that the proposed technique minimized the time required to select the best bath and it was more effective in software testing. However, concerns such as how to benefit from the proposed technique in the regression test along with the generated test cases remained unclear and were not prioritized in this study. Another limitation of the study was that the proposed technique remained unevaluated.
Konsaard and Ramingwong [21] proposed an approach-based genetic algorithm to prioritize the generated test cases. This approach generated the test cases from the source code and then the fitness function in the genetic algorithm was modified by using Average Percentage Code Coverage (APCC) to yield maximum code coverage. The result of this research work showed that the proposed approach was efficient in terms of coverage percentage and execution time. Despite the promising results of this approach, it was not a fully automated approach. Additionally, it did not consider the problem of optimal solution in the genetic algorithm.
After reviewing more than fifty studies, it has become evident that no research work has been conducted so far to propose a fully automated approach to generate the regression test cases from software specifications and to prioritize such test cases. Interestingly, most of such studies proposed approaches to generate the test cases from the software code or software specification to find the optimal test case that could achieve maximum coverage without prioritizing other test cases. Additionally, a few other studies were done to solve the problem of optimal solution that resulted from the application of the genetic algorithm. Therefore, more studies are required to support the results of these studies. Furthermore, most of the proposed prioritization techniques select the test cases on the basis of their ability to cover more faults without considering the test adequacy. For this end, this research work intends to bridge the gap in the related literature by proposing a fully automated and a complete approach to generate the regression test cases from the software specifications and to prioritize these test cases by using the genetic algorithm associated with solving the problem of population ageing and obtaining a maximum test suit coverage. Furthermore, this is the first study that has adapted the criteria of Average Percentage Transition Coverage (APTC) to evaluate the proposed approach in term of coverage percentage. Therefore, the goal of this research work is to provide an answer to the following research question: how does the proposed approach compare in term of time, effort and coverage to the other search-based approaches?
The rest of this research work is organized as follows: section two introduces the proposed technique. Section three addresses the experimentation, results and discussion on the study. Finally, the conclusion and future work are included in section four.

The Proposed Approach
This research work aims to propose an automated approach to generate and prioritize the regression test cases from software specifications (activity diagram) on the basis of Revised Genetic Algorithm. In order to construct the proposed approach, the genetic algorithm has been revised to solve the problem of local optimal solution and then used to prioritize the regression test cases. The proposed approach consists of three steps: 1) automatically converting the activity diagram into a Control Flow Graph (CFG) 2) automatically generating the test cases from the CFG 3) 448 prioritizing the regression test cases using a Revised Genetic Algorithm. Figure 1 shows the proposed approach, where all steps of the proposed approach are discussed in detail bellow.
Step one: converting the activity diagram into CFG the activity diagram is employed to model the dynamic behavior of a set of objects in the software systems. Interestingly, the activity diagram represents a set of objectes activities, so it could be used to describe the operations in the design stage, the sequence of activities among the involving objects in the control flow, and the relations between activities and objects in the message flow. Furthermore, it details the main, alternative and exception scenarios related to each use case [31]. Thus, the activity diagram allows determining coverage criteria to assure a particular degree of a completeness of the regression test scenarios. The activity diagram involves two types of activities: action activities (events) and control activities including the initial activity, last activity, decisions, merge and fork. Thus, we can make the following definition from the description of the construction of an activity diagram. (1) where A denotes a finite set of activities (a 1 , a 2 , ……a n ); T represents a finite set of control flow (t 1 , t 2 ,….t n ) from an activity to another; J represents a finite set of forks (f 1 , f 2 , ….. f n ); and J denotes a finite set of joints (J 1 , J 2 , ……, J n ).
In this phase, the scenarios from the activity diagram are extracted to be used in the generation process of regression test cases. Interestingly, these scenarios are converted into control flow graph. Wherein, each node in the CFG corresponds to an action or control activity in the activity diagram and each edge represents the control flow between the activities. Figure  2 presents the algorithm of converting the activity diagram into a control flow graph.

End if
Step two: Generating the test cases from CFG In this phase, the control flow graph of the activity diagram from step one is used as an input. Consequently, the proposed technique generates all possible paths (regression test cases) in a control flow graph. Wherein a path is a finite set of nodes and edges (transitions) from the initial node to the final node. It is worth mentioning that each independent path has at least one new edge in a control flow graph. Additionally, each decision divides the path into two separate paths in the CFG: true path and false path. In software testing, the test cases must traverse each path at least for once through the CFG. Thus, a new concept has been defined as definition 2.

Definition 2:
A regression test case could be defined as an execution path from the initial activity to the final activity as the following.

Figure 3
Algorithm of Generating Test Paths from CFG. Step three: the Revised Genetic alg heuristic tec [36]. Althou implementat components and mutatio a large nu reproduction the standard five steps. chromosome Second step fitness value sub-steps a produced: individuals combines g probability offstriping w replacement survive to th if the termin step [38].
The applicat and improve challenge as population i without con (population section add appeared in genetic algo

The A
There is a b genetic algo Regarding coverage, oc populations, the other han to more ext further clari concept call ageing degr process of g Step two: Generating the test cases from CFG In this phase, the control flow graph of the activity diagram from step one is used as an input. Consequently, the proposed technique generates all possible paths (regression test cases) in a control flow graph. Wherein a path is a finite set of nodes and edges (transitions) from the initial node to the final node. It is worth mentioning that each independent path has at least one new edge in a control flow graph. Additionally, each decision divides the path into two separate paths in the CFG: true path and false path. In software testing, the test cases must traverse each path at least for once through the CFG. Thus, a new concept has been defined as definition 2.

Definition 2:
A regression test case could be defined as an execution path from the initial activity to the final activity as the following.

449
Information Technology and Control 2021/3/50 tc ϵ TC, tc= {a 1 , t 1 ……… a n , t n }. (2) where, tc represents a regression test case; TC denotes a set of regression test cases; a represents a node in the control flow graph and t means an edge in the control flow graph. Figure 3 shows the algorithm of Generating test paths from CFG.

Figure 3
Algorithm of Generating Test Paths from CFG In this phase, the control flow graph of the activity diagram from step one is used as an input. Consequently, the proposed technique generates all possible paths (regression test cases) in a control flow graph. Wherein a path is a finite set of nodes and edges (transitions) from the initial node to the final node. It is worth mentioning that each independent path has at least one new edge in a control flow graph. Additionally, each decision divides the path into two separate paths in the CFG: true path and false path. In software testing, the test cases must traverse each path at least for once through the CFG. Thus, a new concept has been defined as definition 2.

Definition 2:
A regression test case could be defined as an execution path from the initial activity to the final activity as the following.

Figure 3
Algorithm of Generating Test Paths from CFG. survive to the next generation of the population. Fifth step: if the termination conditions are not met go to the second step [38].
The application of this technique could reduce the effort and improve the test coverage criterion. However, the big challenge associated with implementing the GA is that the population is often trapped in the optimal local solution without continued enhancement of the test coverage (population ageing) [22, and 38]. The following subsection addresses the population ageing problem that appeared in the software testing which is based on the genetic algorithm.

The Ageing Problem in Software Testing
There is a big difference between the maturation of the genetic algorithm and the problem of population ageing. Regarding the improvement in the software testing coverage, occasionally when there is a big difference in populations, the improvement in the coverage is slight. On the other hand, a little difference in populations could lead to more extensive coverage. Consequently, in order to further clarify the problem of population ageing, a new concept called ageing factor has been defined giving the ageing degrees of the populations produced during the process of genetic algorithm-based testing.
Assume that tci,j is the number of regression test cases in the ith individuals in the jth generation. Additionally, TCpopt is the number of individuals in jth generation, in other words TCpopt represents the number of regression test cases in the jth generation. The total number of regression test cases in all jth generations could be presented as∑ ∑ nt,i Npopj t=1 i j=1 . In the same case, the total number of regression test cases in each generation j, generation j+1, generation j+2 … generation j+n are Step three: Prioritizing the generated test cases by using the Revised Genetic Algorithm (RGA) Genetic algorithm is one of the most popular meta-heuristic techniques used to optimize software testing data [36]. Although there is no a specific definition of GA, the implementations of such algorithm share the same components including: populations, selection, crossover, and mutation operations. The population of GA involves a large number of individuals and subject to the reproduction and mutation. the standard work of the genetic algorithms consists of five steps. First step: an initial population of n chromosomes (individuals) is randomly produced. Second step: A fitness function is applied to assign a fitness value to each chromosome. In the third step three sub-steps are repeated until m new individuals are produced: 1) the selection operation selects two individuals to reproduce, 2) the crossover operation combines genetic materials from both individuals with probability pc, and 3) the mutation operation changes offstriping with probability pm. in the fourth step the replacement operation selects the individuals that will survive to the next generation of the population. Fifth step: if the termination conditions are not met go to the second step [38].
The application of this technique could reduce the effort and improve the test coverage criterion. However, the big challenge associated with implementing the GA is that the population is often trapped in the optimal local solution without continued enhancement of the test coverage (population ageing) [22, and 38]. The following sub-section addresses the population ageing problem that appeared in the software testing which is based on the genetic algorithm.

The Ageing Problem in Software Testing
There is a big difference between the maturation of the genetic algorithm and the problem of population ageing. Regarding the improvement in the software testing coverage, occasionally when there is a big difference in populations, the improvement in the coverage is slight. On the other hand, a little difference in populations could lead to more extensive coverage. Consequently, in order to further clarify the problem of population ageing, a new concept called ageing factor has been defined giving the ageing degrees of the populations produced during the process of genetic algorithm-based testing.
Assume that tci,j is the number of regression test cases in the ith individuals in the jth generation. Additionally, TCpopt is the number of individuals in jth generation, in other words TCpopt represents the number of regression test cases in the jth generation. The total number of regression test cases in all jth generations could be presented as Else Ei = ai ai +1

End if
Step two: Generating the test cases from CFG In this phase, the control flow graph of the activity diagram from step one is used as an input. Consequently, the proposed technique generates all possible paths (regression test cases) in a control flow graph. Wherein a path is a finite set of nodes and edges (transitions) from the initial node to the final node. It is worth mentioning that each independent path has at least one new edge in a control flow graph. Additionally, each decision divides the path into two separate paths in the CFG: true path and false path. In software testing, the test cases must traverse each path at least for once through the CFG. Thus, a new concept has been defined as definition 2.
Definition 2: A regression test case could be defined as an execution path from the initial activity to the final activity as the following.

Figure 3
Algorithm of Generating Test Paths from CFG.

add(CN) End
Second step: A fitness function is applied to fitness value to each chromosome. In the third sub-steps are repeated until m new indivi produced: 1) the selection operation sel individuals to reproduce, 2) the crossover combines genetic materials from both individ probability pc, and 3) the mutation operation offstriping with probability pm. in the fourth replacement operation selects the individuals survive to the next generation of the population. if the termination conditions are not met go to t step [38].
The application of this technique could reduce and improve the test coverage criterion. Howev challenge associated with implementing the GA population is often trapped in the optimal loca without continued enhancement of the test (population ageing) [22, and 38]. The follow section addresses the population ageing prob appeared in the software testing which is bas genetic algorithm.

The Ageing Problem in Software T
There is a big difference between the maturati genetic algorithm and the problem of populatio Regarding the improvement in the softwar coverage, occasionally when there is a big dif populations, the improvement in the coverage is the other hand, a little difference in populations c to more extensive coverage. Consequently, in further clarify the problem of population agein concept called ageing factor has been defined g ageing degrees of the populations produced d process of genetic algorithm-based testing.
Assume that tci,j is the number of regression tes the ith individuals in the jth generation. Add TCpopt is the number of individuals in jth gene other words TCpopt represents the number of r test cases in the jth generation. The total n regression test cases in all jth generations presented as∑ ∑ nt,i Npopj t=1 i j=1 . In the same case, number of regression test cases in each gen generation j+1, generation j+2 … generation . In the same case, the total number of regression test cases in each generation j, generation j+1, generation j+2 … generation j+n are presented in Equation 3: presented in equation 3: If the population includes m generations and the population coverage is still the same, this situation indicates that the population is ageing and trapped in the optimal local solution [22,38]. Equation 4 represents this situation in which q denotes the ratio of the number of newly produced regression test cases that do not improve the coverage promotion in m generations in comparison with a prior number of useful test cases (ageing factor). q= (the increased number of tc(i) total number of tc) � *100%. (4) As shown in equation four, the ageing factor is influenced by the increased number of generation (i If the population includes m generations and the population coverage is still the same, this situation indicates that the population is ageing and trapped in the optimal local solution [22,38]. Equation 4 represents this situation in which q denotes the ratio of the number of newly produced regression test cases that do not improve the coverage promotion in m generations in comparison with a prior number of useful test cases (ageing factor).
If the population includes m generations and the population coverage is still the same, this situation indicates that the population is ageing and trapped in the optimal local solution [22,38]. Equation 4 represents this situation in which q denotes the ratio of the number of newly produced regression test cases that do not improve the coverage promotion in m generations in comparison with a prior number of useful test cases (ageing factor).
q=(the increased number of tc(i) total number of tc) � * 100%.
As shown in equation four, the ageing factor is influenced by the increased number of generation (i). The more regression test cases that do not increase the coverage, the higher the degree of population ageing. Consequently, the maturity of the population does not mean that the population ageing will not happen. In the same sense, if the population is not mature, there is a possibility of occurrence of the population ageing on account of the interaction of genetic algorithm with searching space.
On account of the ageing process, regression test cases (individuals) could not improve the transition coverage in software specifications space (activity diagram). In this case, the operations of the genetic algorithm will lose their ability to optimize the transition coverage. So, one of the objectives of this research work is that, after generating a second population, despite the current coverage of the population is stagnate at the same stage, the population evolution process is still anticipated to improve the coverage by using some strategies.

Transition Coverage Based Software Testing
The target of software testing with respect to the criteria of transition coverage can be presented as follows: TS= {G, I, A, C, T, Cov, Np(A)}.
In equation 5, G denotes the CFG of an activity diagram under test; I is the input space; A represents the adopted optimization algorithm; C denotes a suite of test cases; T is a set of termination conditions; Cov represents the test coverage and Np(A) represents genetic iterations number. Furthermore: Cov ={TrCovG(C)}.
In equation 6, the TrCovp(C) represents a transition coverage. Further: In equation 7, E represents the mode of genetic code; P0 denotes the initial population; M denotes the population size; Sel indicates the selection factor; Cor indicates the crossover factor; Mut indicates the mutation factor and F represents the adoption fitness function.
In the black-box software testing, the coverage has been calculated by using equation 8. In this equation, the TrExecG(C) represents a set of transitions that are covered in a control flow graph (CFG); TrG indicates the set of transitions in a control flow graph (CFG). For the transition coverage of test suite C, TrExecG(C) is defined as the ratio of the transitions that could be executed by the In this research work, in order to take the genetic algorithm's advantages of prioritizing the test cases, a fitness criterion has been added. This criterion states that the minimal number of test cases that could achieve a maximum transition coverage TrExecp(C) should be applied. Therefore, the following sub-section presents the Revised Genetic Algorithm that solves the problem of population ageing by prioritizing the test cases on the f transition coverage criteria.
vised Genetic Algorithm ection deals with a Revised Genetic Algorithm and to prioritize the test cases. Accordingly, if the population ageing occurs when a significant number of populations are produced and the transition coverage is not improved, in this case, the operation of population regeneration should be triggered, so that a new population is produced and the processes of the genetic algorithm are executed successively.

Basic Population in Genetic Algorithm-Based Software Testing
In the GA-based software testing, the population in GA is represented by individuals including the set of test cases as shown in equation 9 with a transversal vector of X={x1,1 x1, 2 …… x1,m} indicating the corresponding test cases. Equation 10 shows the total populations with Mpopi denoting the population in ith iteration [27].
After the populations are initialized with respect to the number of individuals, the generated test cases are selected to be transmitted to the next generation on the basis of the value of the selection factor's value. For instance, if the value of the selection factor is set to 0.7. It implies that individuals with fitness value equal to or greater than 70% will be selected to the next generation and individuals with lesser fitness value will not be transmitted [25], so only the best individuals will be transmitted. As shown in equation 11, each individual with 0.7 as a fitness value will be allocated in The crossover process is a process other than the selection applied in the genetic algorithm to find the optimal solution. This process is governed by the value of the crossover factor. For example, if the value of the crossover factor is set to 0.9, it implies that two individuals with 90% probability of performing crossover operation are randomly selected [25]. Finally, in order to diversify the search into a new area of the search space, the elements of the selected individuals are randomly mutated. (4) As shown in Equation 4, the ageing factor is influenced by the increased number of generation (i). The more regression test cases that do not increase the coverage, the higher the degree of population ageing. Consequently, the maturity of the population does not mean that the population ageing will not happen. In the same sense, if the population is not mature, there is a possibility of occurrence of the population ageing on account of the interaction of genetic algorithm with searching space.
On account of the ageing process, regression test cases (individuals) could not improve the transition coverage in software specifications space (activity diagram). In this case, the operations of the genetic algorithm will lose their ability to optimize the transition coverage. So, one of the objectives of this research work is that, after generating a second population, despite the current coverage of the population is stagnate at the same stage, the population evolution process is still anticipated to improve the coverage by using some strategies.

Transition Coverage Based Software Testing
The target of software testing with respect to the criteria of transition coverage can be presented as follows: TS= {G, I, A, C, T, Cov, Np(A)}. (5) In Equation 5, G denotes the CFG of an activity diagram under test; I is the input space; A represents the adopted optimization algorithm; C denotes a suite of test cases; T is a set of termination conditions; Cov represents the test coverage and Np(A) represents genetic iterations number. Furthermore: Cov ={TrCovG(C)}.  In Equation 7, E represents the mode of genetic code; P0 denotes the initial population; M denotes the pop-ulation size; Sel indicates the selection factor; Cor indicates the crossover factor; Mut indicates the mutation factor and F represents the adoption fitness function.
In the black-box software testing, the coverage has been calculated by using Equation 8. In this equation, the TrExecG(C) represents a set of transitions that are covered in a control flow graph (CFG); TrG indicates the set of transitions in a control flow graph (CFG). For the transition coverage of test suite C, TrExecG(C) is defined as the ratio of the transitions that could be executed by the test suite (C) to the total number of transitions in the CFG of a software activity diagram.
TrExecG(C)= | TrExecG|/ |TrG| . (8) In this research work, in order to take the genetic algorithm's advantages of prioritizing the test cases, a fitness criterion has been added. This criterion states that the minimal number of test cases that could achieve a maximum transition coverage TrExecp(C) should be applied. Therefore, the following sub-section presents the Revised Genetic Algorithm that solves the problem of population ageing by prioritizing the test cases on the basis of transition coverage criteria.

Revised Genetic Algorithm
This section deals with a Revised Genetic Algorithm which is used to solve the problem of population ageing and to prioritize the test cases. Accordingly, if the population ageing occurs when a significant number of populations are produced and the transition coverage is not improved, in this case, the operation of population regeneration should be triggered, so that a new population is produced and the processes of the genetic algorithm are executed successively.

Basic Population in Genetic Algorithm-Based Software Testing
In the GA-based software testing, the population in GA is represented by individuals including the set of test cases as shown in Equation 9 with a transversal vector of X={x1,1 x1, 2 …… x1,m} indicating the corresponding test cases. Equation 10 shows the total populations with Mpopi denoting the population in ith iteration [27].

Basic Population in Genetic Algorithm-Based Software Testing
In the GA-based software testing, the population in GA is represented by individuals including the set of test cases as shown in equation 9 with a transversal vector of X={x1,1 x1, 2 …… x1,m} indicating the corresponding test cases. Equation 10 shows the total populations with Mpopi denoting the population in ith iteration [27].
After the populations are initialized with respect to the number of individuals, the generated test cases are selected to be transmitted to the next generation on the basis of the value of the selection factor's value. For instance, if the value of the selection factor is set to 0.7. It implies that individuals with fitness value equal to or greater than 70% will be selected to the next generation and individuals with lesser fitness value will not be transmitted [25], so only the best individuals will be transmitted. As shown in equation 11, each individual with 0.7 as a fitness value will be allocated in The crossover process is a process other than the selection applied in the genetic algorithm to find the optimal solution. This process is governed by the value of the crossover factor. For example, if the value of the crossover factor is set to 0.9, it implies that two individuals with 90% probability of performing crossover operation are randomly selected [25]. Finally, in order to diversify the search into a new area of the search space, the elements of the selected individuals are randomly mutated. (9) ace.

Basic Population in Genetic Algorithm-Based Software Testing
In the GA-based software testing, the population in GA is represented by individuals including the set of test cases as shown in equation 9 with a transversal vector of X={x1,1 x1, 2 …… x1,m} indicating the corresponding test cases. Equation 10 shows the total populations with Mpopi denoting the population in ith iteration [27].
After the populations are initialized with respect to the number of individuals, the generated test cases are selected to be transmitted to the next generation on the basis of the value of the selection factor's value. For instance, if the value of the selection factor is set to 0.7. It implies that individuals with fitness value equal to or greater than 70% will be selected to the next generation and individuals with lesser fitness value will not be transmitted [25], so only the best individuals will be transmitted. As shown in equation 11, each individual with 0.7 as a fitness value will be allocated in The crossover process is a process other than the selection applied in the genetic algorithm to find the optimal solution. This process is governed by the value of the crossover factor. For example, if the value of the crossover factor is set to 0.9, it implies that two individuals with 90% probability of performing crossover operation are randomly selected [25]. Finally, in order to diversify the search into a new area of the search space, the elements of the selected individuals are randomly mutated. (10) After the populations are initialized with respect to the number of individuals, the generated test cases are selected to be transmitted to the next generation on the basis of the value of the selection factor's value. For instance, if the value of the selection factor is set to 0.7. It implies that individuals with fitness value equal to or greater than 70% will be selected to the next generation and individuals with lesser fitness value will not be transmitted [25], so only the best individuals will be transmitted. As shown in Equation 11, each individual with 0.7 as a fitness value will be allocated in The crossover process is a pro (11) The crossover process is a process other than the selection applied in the genetic algorithm to find the optimal solution. This process is governed by the value of the crossover factor. For example, if the value of the crossover factor is set to 0.9, it implies that two individuals with 90% probability of performing crossover operation are randomly selected [25]. Finally, in order to diversify the search into a new area of the search space, the elements of the selected individuals are randomly mutated.

Coverage Oriented Fitness Function
In this paper, the criterion of transition coverage has been used to measure the efficiency of the proposed approach and is employed as the objective fitness value. As shown in Equation 12, in order to achieve individual evaluation, the sum of fitness value

Coverage Oriented Fitness Function
In this paper, the criterion of transition coverage has been used to measure the efficiency of the proposed approach and is employed as the objective fitness value. As shown in equation 12, in order to achieve individual evaluation, the sum of fitness value (∑ ) is converted to the standard fitness value �f i,norm � [7].

The Proposed Algorithm
The revised genetic algorithm has been defined as the following equation. The Mp denotes the individuals of a population; the Sel represents the operation of selection; Cor denotes the operation of crossover; Mut denotes the operation of mutation; R represents the regeneration process; Cov represents the testing coverage; the number of iteration is randomly generate a new population. In this case, the past population is relocated with the new one. However, the selection is based on crossover and mutation process which are used in the new population only if the new population enhances the transition coverage. If the new population cannot contribute towards the improvement in the transition coverage, the population ageing will be triggered resulting in the elimination of the population, and the generation of a new population until the ageing condition remains inconclusive. Figure 4 shows the new algorithm based genetic algorithm in which the subprocess of the ageing factor calculation and population regeneration process is involved. In order to prioritize the individuals (regression test cases), once the optimal regression test case is obtained, it is removed from the input list and saved in the prioritized regression test cases list. Consequently, the revised algorithm is re-executed on the remaining regression test cases.

Results Analysis
In order to introduce and validate the results from the proposed approach, an experimental tool has been constructed. This tool involves hardware layer, operating is converted to the standard fitness value

Coverage Oriented Fitness Function
In this paper, the criterion of transition coverage has been used to measure the efficiency of the proposed approach and is employed as the objective fitness value. As shown in equation 12, in order to achieve individual evaluation, the sum of fitness value (∑ ) is converted to the standard fitness value �f i,norm � [7].

The Proposed Algorithm
The revised genetic algorithm has been defined as the following equation. The Mp denotes the individuals of a population; the Sel represents the operation of selection; Cor denotes the operation of crossover; Mut denotes the operation of mutation; R represents the regeneration process; Cov randomly generate a new population. In this case, the past population is relocated with the new one. However, the selection is based on crossover and mutation process which are used in the new population only if the new population enhances the transition coverage. If the new population cannot contribute towards the improvement in the transition coverage, the population ageing will be triggered resulting in the elimination of the population, and the generation of a new population until the ageing condition remains inconclusive. Figure 4 shows the new algorithm based genetic algorithm in which the subprocess of the ageing factor calculation and population regeneration process is involved. In order to prioritize the individuals (regression test cases), once the optimal regression test case is obtained, it is removed from the input list and saved in the prioritized regression test cases list. Consequently, the revised algorithm is re-executed on the remaining regression test cases.

Results Analysis
In order to introduce and validate the results from the proposed approach, an experimental tool has been [7].

Coverage Oriented Fitness Function
In this paper, the criterion of transition coverage has been used to measure the efficiency of the proposed approach and is employed as the objective fitness value. As shown in equation 12, in order to achieve individual evaluation, the sum of fitness value (∑ ) is converted to the standard fitness value �f i,norm � [7].

The Proposed Algorithm
The revised genetic algorithm has been defined as the following equation. The Mp denotes the individuals of a population; the Sel randomly generate a new population. In this case, the past population is relocated with the new one. However, the selection is based on crossover and mutation process which are used in the new population only if the new population enhances the transition coverage. If the new population cannot contribute towards the improvement in the transition coverage, the population ageing will be triggered resulting in the elimination of the population, and the generation of a new population until the ageing condition remains inconclusive. Figure 4 shows the new algorithm based genetic algorithm in which the subprocess of the ageing factor calculation and population regeneration process is involved. In order to prioritize the individuals (regression test cases), once the optimal regression test case is obtained, it is removed from the input list and saved in the prioritized regression test cases list. Consequently, the revised algorithm is re-executed on the remaining regression test cases.

The Proposed Algorithm
The revised genetic algorithm has been defined as the following equation.
The Mp denotes the individuals of a population; the Sel represents the operation of selection; Cor denotes the operation of crossover; Mut denotes the operation of mutation; R represents the regeneration process; Cov represents the testing coverage; the number of iteration is presented as Np(A); and the termination condition is presented as T.

Figure 4
Algorithm of Revised GA used to measure the efficiency of the proposed approach and is employed as the objective fitness value. As shown in equation 12, in order to achieve individual evaluation, the sum of fitness value (∑ f i ) is converted to the standard fitness value �f i,norm � [7].

The Proposed Algorithm
The revised genetic algorithm has been defined as the following equation. The Mp denotes the individuals of a population; the Sel represents the operation of selection; Cor denotes the operation of crossover; Mut denotes the operation of mutation; R represents the regeneration process; Cov represents the testing coverage; the number of iteration is presented as Np(A); and the termination condition is presented as T.

Figure 4:
Algorithm of Revised GA

Re
In ord propos constru system layer a by usin and 4 propos by usin

E
The ac machin engine have a proces priorit the Ve can se selecti produc and re display the co deposi the ma the dep a prod shows where functio

Figure
As previously mentioned, when population ageing is detected, the regeneration process is triggered to randomly generate a new population. In this case, the past population is relocated with the new one. However, the 452 selection is based on crossover and mutation process which are used in the new population only if the new population enhances the transition coverage. If the new population cannot contribute towards the improvement in the transition coverage, the population ageing will be triggered resulting in the elimination of the population, and the generation of a new population until the ageing condition remains inconclusive. Figure 4 shows the new algorithm based genetic algorithm in which the sub-process of the ageing factor calculation and population regeneration process is involved. In order to prioritize the individuals (regression test cases), once the optimal regression test case is obtained, it is removed from the input list and saved in the prioritized regression test cases list. Consequently, the revised algorithm is re-executed on the remaining regression test cases.

Results Analysis
In order to introduce and validate the results from the proposed approach, an experimental tool has been constructed. This tool involves hardware layer, operating system layer and application layer (tool). The hardware layer and operating system layer have been implemented by using Windows 8.1 Pro on a PC with i7 2.20 GHz CPU and 4 GB RAM, and the application layer includes the proposed tool which implements the proposed approach by using Java programming language.

Figure 5a
The activity diagram of VMS

Figure 5b
The activity diagram of ATM The activity diagram of VMS As explained in phase one of the proposed approach, the control flow graph has been automatically generated from the activity diagram. Since each activity in the activity diagram has been represented as a node in the control flow graph and each interaction in the activity diagram has been presented by an edge in the control flow graph, it is The activity diagram of VMS As explained in phase one of the proposed approach, the control flow graph has been automatically generated from the activity diagram. Since each activity in the activity diagram has been represented as a node in the control flow graph and each interaction in the activity diagram has been presented by an edge in the control flow graph, it is worth noting that each decision in the activity diagram has been traversed in two paths in the control flow graph representing the true and false answers. Figures 6a-6b present the control flow graphs generated from the activity diagrams of the vending machine system and ATM system, respectively.

Figure 5b
The activity diagram of ATM

Figure 6a
Control flow diagram of VMS

Experimental Results
The activity diagrams of the vending machine and ATM machine have been applied previously in many software engineering research works [9,35,32]. Therefore, they have also been used in this research to illustrate the processes of the proposed approach to generate and prioritize regression test cases. As shown in Figure 5a, in the Vending Machine System (VMS) a user (customer) can select a type of drink, then the machine validates the selection and checks for the availability of a product. If a product is not available, the machine displays a message and returns back the selection menu. Else the machine displays a product price and then asks the user to insert the coins. Consequently, the machine calculates the proach, the erated from he activity ontrol flow iagram has graph, it is iagram has flow graph ures 6a-6b the activity and ATM

Figure 6a
Control flow diagram of VMS

Figure 6b
Control flow diagram of ATM Consequently, the proposed approach provides an algorithm (see the algorithm in Figure 3) to produce all paths in the control flow graph recursively.
By applying this algorithm on the control flow graph generated from the activity diagram of the vending machine and ATM, four paths (regression test cases) have been produced from the CFG of VMS and five paths from the CFG of ATM see Figures 7a-7b.

Figure 7a
Regression test cases of VMS

Figure 7b
Regression test cases of ATM

Figure 8b
Properitized regression test cases of ATM

Validation of Results
The prioritized test cases resulted from approach are evaluated using the avera transition coverage (APTC) metric, which degree at which prioritized test cases cover Equation 14 is used to calculate the APTC.

Figure 6a
Control flow diagram of VMS

Figure 6b
Control flow diagram of ATM deposited amount. If the deposited amount is insufficient, the machine displays an error message and dispense back the deposited coins or else, the vending machine dispenses a product and returns back to the main menu. Figure 5a shows the activity diagram of vending machine software, where Figure 5b shows the activity diagram of withdrawal function in the ATM system. As explained in phase one of the proposed approach, the control flow graph has been automatically generated from the activity diagram. Since each activity in the activity diagram has been represented as a node in the control flow graph and each interaction in the activity diagram has been presented by an edge in the control flow graph, it is worth noting that each decision in the activity diagram has been traversed in two paths in the control flow graph representing the true and false answers. Figures 6a-6b present the control flow graphs generated from the activity diagrams of the vending machine system and ATM system, respectively. Consequently, the proposed approach provides an algorithm (see the algorithm in Figure 3) to produce all paths in the control flow graph recursively. By applying this algorithm on the control flow graph generated from the activity diagram of the vending machine and ATM, four paths (regression test cases) have been produced from the CFG of VMS and five paths from the CFG of ATM see Figures 7a-7b.

Figure 7a
Regression test cases of VMS

Figure 6b
Control flow diagram of ATM Consequently, the proposed approach provides an algorithm (see the algorithm in Figure 3) to produce all paths in the control flow graph recursively.
By applying this algorithm on the control flow graph generated from the activity diagram of the vending machine and ATM, four paths (regression test cases) have been produced from the CFG of VMS and five paths from the CFG of ATM see Figures 7a-7b.

Figure 7a
Regression test cases of VMS

Figure 7b
Regression test cases of ATM Finally, the Revised Genetic Algorithm has been applied to prioritize the generated test paths (regression test cases). Figure 8 presents the prioritized regression test Properitized regress

Figure 7b
Regression test cases of ATM

Figure 6b
Control flow diagram of ATM Consequently, the proposed approach provides an algorithm (see the algorithm in Figure 3) to produce all paths in the control flow graph recursively.
By applying this algorithm on the control flow graph generated from the activity diagram of the vending machine and ATM, four paths (regression test cases) have been produced from the CFG of VMS and five paths from the CFG of ATM see Figures 7a-7b.

Figure 7a
Regression test cases of VMS

Figure 7b
Regression test cases of ATM Finally, the Revised Genetic Algorithm has been applied to prioritize the generated test paths (regression test cases). Figure 8 presents the prioritized regression test cases for both systems produced by the revised genetic algorithm.

Figure 8b
Properitized regressio Properitized regression test cases of VMS

Figure 8b
Properitized regression test cases of ATM Finally, the Revised Genetic Algorithm has been applied to prioritize the generated test paths (regression test cases). Figure 8 presents the prioritized regression test cases for both systems produced by the revised genetic algorithm.

Figure 7a
Regression test cases of VMS

Figure 7b
Regression test cases of ATM Finally, the Revised Genetic Algorithm has been applied to prioritize the generated test paths (regression test cases). Figure 8 presents the prioritized regression test cases for both systems produced by the revised genetic algorithm.

Validation of Results
The prioritized test cases resulted from the proposed approach are evaluated using the average percentage transition coverage (APTC) metric, which quantifies the degree at which prioritized test cases cover the conditions. Equation 14 is used to calculate the APTC.

APTC=1-
wherein T denotes the test suite under evaluation; n represents test cases; m indicates the number of transitions in the control flow graph and TCi shows the position of the first test case in the test suite T that covers ith transition.

Figure 8b
Properitized regression test cases of ATM

Validation of Results
The prioritized test cases resulted from the proposed approach are evaluated using the average percentage transition coverage (APTC) metric, which quantifies the degree at which prioritized test cases cover the conditions. Equation 14 is used to calculate the APTC.

APTC=1-
wherein T denotes the test suite under evaluation; n represents test cases; m indicates the number of transitions in the control flow graph and TCi shows the position of the first test case in the test suite T that covers ith transition.

Validation of Results
The prioritized test cases resulted from the proposed approach are evaluated using the average percentage transition coverage (APTC) metric, which quantifies the degree at which prioritized test cases cover the conditions. Properitized regression test cases of VMS

Figure 8b
Properitized regression test cases of ATM

Validation of Results
The prioritized test cases resulted from the proposed approach are evaluated using the average percentage transition coverage (APTC) metric, which quantifies the degree at which prioritized test cases cover the conditions. Equation 14 is used to calculate the APTC.

APTC=1-
wherein T denotes the test suite under evaluation; n represents test cases; m indicates the number of transitions in the control flow graph and TCi shows the position of the first test case in the test suite T that covers ith transition. Optimum order 55% 53 (14) wherein T denotes the test suite under evaluation; n represents test cases; m indicates the number of transitions in the control flow graph and TCi shows the position of the first test case in the test suite T that covers ith transition. GA: in the original genetic algorithm the mutation operators are employed to prioritize the test cases.
Bee Colony Algorithm (BCA): in this technique, the test cases are proiritized to enhance the execution time and coverage [21].
The statistical results in the table show that the proposed approach provides much transition coverage with significantly less execution time in comparison with the other techniques. It is worth mentioning that, the proposed approach is better as it covers the modified transitions in the control flow graph so the modified transitions in the activity diagram of a use case. Hence the proposed approach is better and offers significant help in specific test cases prioritization and providing faults earlier.
From the previous experimental results and analysis of different aspects, the Revised Genetic Algorithm clearly provides a more useful and efficient prioritization approach on average percentage transition coverage (APTC) and execution time in comparison to other prioritization techniques as it achieves more excellent test coverage with minimal regression test cases. As shown in table 1, the original genetic algorithm offers promising results in terms of performance and coverage (53 milliseconds and 96.4%,  [38], the proposed technique provides more test coverage rate and less execution time. Interestingly, the coverage rate and the execution time for RGA are 95% and 24 milliseconds, but for the proposed technique 100% and 10 milliseconds.
The possible reason for this enhancement is due to the other techniques apply mutation operators and their functions for exploring the whole search space and sometimes it is hard for the other techniques to find local optimal solutions that exist in a local search space as compared with the proposed technique. On the contrary, as compared with the other techniques, the proposed technique employ the crossover operators to obtain the local optimal solutions, which may be the reason for the better performance of the proposed technique compared with the other techniques.

Conclusion
The test cases prioritization is an essential task to reduce the time and effort required in the test regression. In order to obtain maximum transition coverage, this research work has proposed an approach-based revised GA to generate and prioritize the test cases generated from software specifications. five techniques of prioritization of test cases have been empirically stud-ied and their performances have been compared. The performance of the proposed approach provides promising outcomes on both coverage and time criteria.
The proposed approach takes advantage of ability to generate various test cases from software specifications (activity diagram) and to prioritize such test cases on the basis of revised genetic algorithm. This approach has been automated using the Java Language. The necessity and benefit of applying a new metric APTC as a fitness function in the revised genetic algorithm is also shown in this proposed approach. Finally, the results from the empirical study have been analyzed and compared with the original genetic algorithm and with other techniques based on APTC [25,38]. It was then found that the proposed approach is better and more efficient in maximizing the coverage with less execution time and it avoids the problem of population ageing that resulted from the application of the genetic algorithm by trigger the population regeneration method when the population ageing detected. Thus, these results provide a good answer for the research question which was formulated as: how does the proposed approach compare in term of time, effort and coverage to the other search-based approaches. Moreover, the experimental results from this research work confirmed also the prior results stated in the software testing literature regarding the good performance of the genetic algorithm [38]. However, the results indicate to some interesting characteristics of the proposed approach including minimizing the execution time and maximizing the transitions coverage.