Bi-Encoder Polyp Net: A Novel Architecture for Enhanced Polyp Segmentation in Endoscopic Images

Authors

  • Qiqiang Duan School of Mathematics and Information Science, Zhongyuan University of Technology, Zhengzhou, 451191, Henan, China
  • Cong Gu School of Mathematics and Information Science, Zhongyuan University of Technology, Zhengzhou, 451191, Henan, China

DOI:

https://doi.org/10.5755/j01.itc.54.2.41107

Keywords:

Transformer, CNN, Polyp Segmentation, Image Segmentation

Abstract

Automatic polyp segmentation in endoscopic images holds critical clinical value for early colorectal cancer diagnosis. While existing segmentation models have achieved notable progress, two key challenges persist in algorithmic performance improvement. First, dynamic adjustments of colonoscope tip orientation during examinations induce viewpoint variations, which amplify polyp appearance diversity and hinder robust feature learning. Second, the inherent similarity between polyps and surrounding tissues leads to blurred boundaries. Although convolutional neural networks (CNNs) have demonstrated significant advancements, their limitations in modeling global dependencies and reliance on aggressive downsampling operations often cause redundant network structures and local detail loss. To address these bottlenecks, we propose Bi-Encoder Polyp Net – a novel parallel architecture integrating Pyramid Vision Transformer and ResNet. This dual-branch design effectively captures global contextual dependencies while preserving low-level spatial details. A feature alignment module bridges the semantic gap between dual-branch feature maps, and an iterative semantic embedding unit further injects high-level semantic information into aligned low-level features. Extensive experiments across five public polyp segmentation benchmarks validate the network’s effectiveness, demonstrating superior capability in processing real-world colonoscopy images.

Downloads

Published

2025-07-14

Issue

Section

Articles