MSPF-LMFF: Category-Level 6D Object Pose Estimation via Multi-Scale Prior Point Cloud Fusion and Lightweight Multi-Feature Fusion

Peng Cao; Tengfei  Weng; Qi  Han; Peng Ye; Long  Cao; Cong Han; Yuan Tian

doi:10.5755/j01.itc.54.4.40202

Authors

Peng Cao https://orcid.org/0009-0009-3308-8137
Tengfei Weng https://orcid.org/0000-0003-4549-713X
Qi Han https://orcid.org/0000-0002-3264-9571
Peng Ye https://orcid.org/0009-0000-7413-6988
Long Cao https://orcid.org/0009-0000-1672-3690
Cong Han https://orcid.org/0009-0000-1052-464X
Yuan Tian https://orcid.org/0000-0003-1554-4791

DOI:

https://doi.org/10.5755/j01.itc.54.4.40202

Keywords:

特征融合、融合点云、轻量级模型、多尺度特征融合、轻量级多特征融合。

Abstract

Object pose estimation is a critical task in the ﬁeld of machine vision. Existing pose estimation methods often suﬀer from challenges such as large parameter sizes, complex architectures, and high computational costs, which limit their applicability in real-world scenarios. To address these issues, we propose a novel category-level object pose estimation model, named MSPF-LMFF. This model eliminates the reliance on attention mechanisms or precise 3D models, signiﬁcantly reduces computational complexity, and enhances pose estimation accuracy, demonstrating superior performance on both real and synthetic datasets. Specifically, the MSPF module enriches the features of point clouds by integrating multi-scale image texture features with prior point cloud features, making them closer to the target object point cloud. Subsequently, the LMFF module combines geometric features of fused point cloud, depth image features, and geometric features of the target object point cloud to enhance the robustness of the model. At the same time, this module fuses adaptive point cloud features with the target object’s geometric features to improve the reliability of shape information, thereby enhancing the model’s generalization capability across diﬀerent instances of the same category. Following this, a multi-layer perceptron (MLP) generates deformation and mapping matrices to reconstruct the target object’s normalized object coordinate space (NOCS) model. Finally, based on the NOCS model, the point cloud registration module computes the target object’s 6D pose and 3D dimensions. Experimental results demonstrate that MSPF-LMFF outperforms existing methods on the NOCS-REAL and NOCS-CAMERA datasets while signiﬁcantly reducing parameter sizes and training time. Moreover, the proposed model exhibits exceptional generalization capabilities on the Wild 6D dataset, further validating its eﬀectiveness.

MSPF-LMFF: Category-Level 6D Object Pose Estimation via Multi-Scale Prior Point Cloud Fusion and Lightweight Multi-Feature Fusion

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

crossref2

crossref

Information