A Study on 3D Human Pose Estimation with a Hybrid Algorithm of Spatio-temporal Semantic Graph Attention and Deep Learning
DOI:
https://doi.org/10.5755/j01.itc.53.4.37243Keywords:
3D human pose estimation, Graph Convolutional Neural Network, Self attention, TransformerAbstract
This paper introduces a method to enhance 3D human pose estimation accuracy by leveraging human topological structure and temporal information, addressing inaccuracies due to occlusion and complex poses. It proposes a spatiotemporal Transformer network that aggregates local temporal information to predict 3D poses for video frames, reducing sequence length through cross-step convolution. To further handle occlusion and information loss, the paper suggests a spatiotemporal graph attention network that incorporates spatial constraints and graph convolution with an improved adjacency matrix to emphasize local information in pose inference. A temporal convolutional network is also employed to model time, and the network alternates between temporal and spatial attention modules to prevent spatiotemporal information loss. Experiments on Human3.6m and HumanEva datasets demonstrate that the proposed method outperforms other approaches in prediction accuracy.
Downloads
Published
Issue
Section
License
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.