Efficient Training Framework for Multi-USV System Based on Off-policy Deep Reinforcement Learning

Authors

DOI:

https://doi.org/10.5755/j01.itc.54.4.40496

Keywords:

Prioritized Experience Replay, Off-Policy Deep Reinforcement Learning, Attention, Multi-USV

Abstract

Prioritized Experience Replay (PER) is a technical means of off-policy deep reinforcement learning by selecting important experience samples more frequently to improve the training efficiency. However, the non-uniform sampling applied in PER inevitably shifts the state-action distribution and brings the estimation errors of Q-value function. In this paper, an efficient off-policy reinforcement learning training framework called Attention Loss Adjusted Prioritized (ALAP) Experience Replay, is proposed. ALAP exploits the similarity of the transitions in buffer to quantify the training progress, and accurately corrects the bias based on the positive correlation between the error compensation strength and the training progress. In order to verify the effectiveness of the algorithm, the ALAP is tested on 15 different games on Atari 2600 benchmark. Additionally, we developed a multi-USV competition scenario using Unreal Engine to further illustrate the superiority as well as the practical value of ALAP.

Downloads

Published

2025-12-19

Issue

Section

Articles