A Study on 3D Human Pose Estimation with a Hybrid Algorithm of Spatio-temporal Semantic Graph Attention and Deep Learning

Authors

  • Shengqing Lin School of Computing and Information Sciences, Fuzhou Institute of Technology, Fuzhou, Fujian 350506, China

DOI:

https://doi.org/10.5755/j01.itc.53.4.37243

Keywords:

3D human pose estimation, Graph Convolutional Neural Network, Self attention, Transformer

Abstract

This paper introduces a method to enhance 3D human pose estimation accuracy by leveraging human topological structure and temporal information, addressing inaccuracies due to occlusion and complex poses. It proposes a spatiotemporal Transformer network that aggregates local temporal information to predict 3D poses for video frames, reducing sequence length through cross-step convolution. To further handle occlusion and information loss, the paper suggests a spatiotemporal graph attention network that incorporates spatial constraints and graph convolution with an improved adjacency matrix to emphasize local information in pose inference. A temporal convolutional network is also employed to model time, and the network alternates between temporal and spatial attention modules to prevent spatiotemporal information loss. Experiments on Human3.6m and HumanEva datasets demonstrate that the proposed method outperforms other approaches in prediction accuracy. 

Downloads

Published

2024-12-21

Issue

Section

Articles