Learning Sliding Policy of Flat Multi-target Objects in Clutter Scenes

Authors

  • Liangdong Wu School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Jiaxi Wu Institute of Automation, Chinese Academy of Sciences, Beijing, China
  • Zhengwei Li School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Yurou Chen Institute of Automation, Chinese Academy of Sciences, Beijing, China
  • Zhiyong Liu School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; Institute of Automation, Chinese Academy of Sciences, Beijing, China; Cloud Computing Center, Chinese Academy of Sciences, Dongguan, Guangdong, China

DOI:

https://doi.org/10.5755/j01.itc.53.1.34708

Keywords:

Deep Learning in Manipulation, Reinforcement Learning, Robot Control, Intelligent system, sliding policy

Abstract

In clutter scenes, one or several targets need to be obtained, which is hard for robot manipulation task. Especially, when the targets are flat objects like book, plates, due to limitation of common robot end-effectors, it will be more challenging. By employing pre-grasp operation like sliding, it becomes feasible to rearrange objects and shift the target towards table edge, enabling the robot to grasp it from a lateral perspective. In this paper, the proposed method transfers the task into a Parameterized Action Markov Decision Process to solve the problem, which is based on deep reinforcement learning. The mask images are taken as one of observations to the network for avoiding the impact of noise of original image. In order to improve data utilization, the policy
network predicts the parameters for the sliding primitive of each object, which is weight-sharing, and then the Q-network selects the optimal execution target. Meanwhile, extra reward mechanism is adopted for improving the efficiency of task actions to cope with multiple targets. In addition, an adaptive policy scaling algorithm is proposed to improve the speed and adaptability of policy training. In both simulation and real system, our method achieves a higher task success rate and requires fewer actions to accomplish the flat multi-target sliding manipulation task within clutter scene, which verifies the effectiveness of ours.

Downloads

Published

2024-03-22

Issue

Section

Articles