Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows
The vehicle routing problem with time windows (VRPTW) as one of the most known combinatorial operations (CO) problem is considered to be a tough issue in practice and the main challenge of that is to find the approximate solutions within a reasonable time. In recent years, reinforcement learning (RL) based methods have gained increasing attention in many CO problems, such as the vehicle routing problems (VRP), due to their enormous potential to efficiently generate high-quality solutions. However, neglecting the information between the constraints and the solutions makes previous approaches performance unideal in some strongly constrained problems, like VRPTW. We present the constraint-aware policy optimization (CPO) for VRPTW
that can let the agent learn the constraints as a representation of the whole environment to improve the generalization of RL methods. Extensive experiments on both the Solomon benchmark and the generated datasets demonstrate that our approach significantly outperforms other competition methods.
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.