Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows

Renchi Zhang; Runsheng Yu; Wei Xia

doi:10.5755/j01.itc.51.1.29924

Authors

Renchi Zhang Hefei University of Technology
Runsheng Yu
Wei Xia

DOI:

https://doi.org/10.5755/j01.itc.51.1.29924

Abstract

The vehicle routing problem with time windows (VRPTW) as one of the most known combinatorial operations (CO) problem is considered to be a tough issue in practice and the main challenge of that is to find the approximate solutions within a reasonable time. In recent years, reinforcement learning (RL) based methods have gained increasing attention in many CO problems, such as the vehicle routing problems (VRP), due to their enormous potential to efficiently generate high-quality solutions. However, neglecting the information between the constraints and the solutions makes previous approaches performance unideal in some strongly constrained problems, like VRPTW. We present the constraint-aware policy optimization (CPO) for VRPTW
that can let the agent learn the constraints as a representation of the whole environment to improve the generalization of RL methods. Extensive experiments on both the Solomon benchmark and the generated datasets demonstrate that our approach significantly outperforms other competition methods.

Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

crossref2

crossref

Information