Information Technology and Control

Towards Real-World Power Grid Scenarios: Video Action Detection with Cross-scale Selective Context Aggregation

2025-07-14T12:38:03+03:00

In this study, we propose a single-stage model for video action detection and a real-world action detection dataset POWER collected from real power operation scenarios. While previous studies have made significant progress in overall classification and localization performance, they often struggle with the actions that have short duration, hindering the application of these approaches. To address this, we introduce the Cross-scale Selective Context Aggregation Network (CSCAN), which focuses on improving the detection of short actions. This network integrates three key components: 1) a cross-scale feature conduction structure combined with a tailored alignment mechanism; 2) a selective context aggregation module based on gating mechanism; and 3) an effective scale-invariant consistency training strategy to enable the model to learn scale-invariant action representation. We evaluated our method on the self-collected dataset POWER and on the most widely used action detection benchmarks THUMOS14 and ActivityNet v1.3. The extensive results show that our model outperforms other approaches, especially in detecting real-world short actions, demonstrating the effectiveness of our approach.

Bi-Encoder Polyp Net: A Novel Architecture for Enhanced Polyp Segmentation in Endoscopic Images

2025-07-14T12:38:02+03:00

Automatic polyp segmentation in endoscopic images holds critical clinical value for early colorectal cancer diagnosis. While existing segmentation models have achieved notable progress, two key challenges persist in algorithmic performance improvement. First, dynamic adjustments of colonoscope tip orientation during examinations induce viewpoint variations, which amplify polyp appearance diversity and hinder robust feature learning. Second, the inherent similarity between polyps and surrounding tissues leads to blurred boundaries. Although convolutional neural networks (CNNs) have demonstrated significant advancements, their limitations in modeling global dependencies and reliance on aggressive downsampling operations often cause redundant network structures and local detail loss. To address these bottlenecks, we propose Bi-Encoder Polyp Net – a novel parallel architecture integrating Pyramid Vision Transformer and ResNet. This dual-branch design effectively captures global contextual dependencies while preserving low-level spatial details. A feature alignment module bridges the semantic gap between dual-branch feature maps, and an iterative semantic embedding unit further injects high-level semantic information into aligned low-level features. Extensive experiments across five public polyp segmentation benchmarks validate the network’s effectiveness, demonstrating superior capability in processing real-world colonoscopy images.

Multi-strategy Hybrid Improved Intelligent Algorithm for Solving UAV-MTSP

2025-07-14T12:38:03+03:00

Unmanned aerial vehicles (UAVs) have been increasingly used in fire monitoring and rescue operations, offering flexibility and efficiency. However, determining the shortest path for all UAVs to visit all regions is a crucial issue, known as the Multiple Traveling Salesman Problem (MTSP), which aims to save time and energy. This paper proposes a novel hybrid heuristic algorithm, MCPWOA, to solve MTSP with a focus on UAV path planning applications. The algorithm integrates the Whale Optimization Algorithm (WOA), Crested Porcupine Optimizer (CPO), Chaotic Mapping Strategy (CMS), Arcsine Control Strategy (ACS) and Reverse Learning Strategy (RLS) to diversify the initial population and achieve rapid exploration. The algorithm's performance is evaluated using the CEC2022 benchmark function set and TSPLIB dataset for function minimization and UAV-MTSP experimental solution finding. Results indicate that MCPWOA outperforms existing WOA, CPO, and other advanced algorithms on most tests, showing higher convergence accuracy. Moreover, MCPWOA's effectiveness is demonstrated in actual UAV fire monitoring and rescue path planning, enhancing fire response efficiency through optimized UAV configuration and task allocation.

Enhancing Open-Set Few-Shot Object Detection with Limited Visual Prompts

2025-07-14T12:38:02+03:00

The text-prompt-based open-vocabulary object detection model effectively encapsulates the abstract concepts of common objects, thereby overcoming the limitations of pre-trained models, which are restricted to detecting a fixed, predefined set of categories. However, due to data scarcity and the constraints of textual descriptions, representing rare or complex objects solely through text remains challenging. In this study, we propose an open-set detection model that supports both visual and textual prompt queries (VTP-OD) to enhance few-shot object detection. A small number of visual prompts not only provide rich class-wise visual features, which enhance class textual representations, but also enable flexible extension to new classes for different downstream tasks. Specifically, we incorporate two adaptation modules based on cross-attention to adapt the pre-trained vision-language model, allowing it to support both text and visual queries. These modules facilitate (i) visual fusion between a limited number of visual prompts and query images and (ii) visual-language fusion between class-aware visual features and textual representations of the classes. Subsequently, the model undergoes prompt tuning using the available few-shot downstream data to adapt to target detection tasks. Experimental results demonstrate that our model outperforms the pre-trained model on the LVIS and COCO benchmarks. Furthermore, we validate its effectiveness on the real-world CoalMine dataset.

WNASNet: Wavelet-Guided Neural Architecture Search for Efficient Single-Image De-raining

2025-07-14T12:38:03+03:00

On rainy days, the uncertainty of the shape and distribution of rain streaks can cause the images captured by RGB image-based measurement equipment to be blurred and distorted. The wavelet transform is extensively utilized in conventional image-enhancing techniques because of its capacity to deliver spatial and frequency domain information and its multidirectional and multiscale characteristics. In image de-raining, the distribution of rain streaks is intricately linked to both spatial domain characteristics and frequency domain spatial attributes. Nonetheless, deep learning-based rain removal models predominantly depend on the spatial characteristics of the image, and RGB data is sometimes insufficient to differentiate rain marks from image details, resulting in the loss of essential image information during the rain removal process. To overcome this limitation, we have created a lightweight single-image rain removal model named the wavelet-enhanced neural architecture search network (WNASNet). This technique isolates image features from rain-affected images and can more efficiently eliminate rain artifacts. The proposed WNASNet presents three notable contributions. Initially, it utilizes wavelet transform to extract multi-frequency feature components. It allocates a distinct feature search block (FSB) to each component, facilitating the identification of task-specific feature extraction networks to enhance deraining efficacy. Secondly, we present a straightforward yet efficient wavelet feature fusion technique (SFF) that selectively employs high- and low-frequency features during the inverse wavelet transformation. This method maintains deraining efficacy while substantially decreasing computational complexity relative to conventional frequency blending techniques. Comprehensive studies on four synthetic and two real-world datasets illustrate the better performance of WNASNet across many evaluation measures, including PSNR, SSIM, LPIPS, NIQE, and BRISQUE, thereby verifying its efficacy and robustness for single-image deraining tasks.

Yolov5-based Intelligent Detection Method for Retail Goods

2025-07-14T12:38:03+03:00

In the current context, intelligent unmanned retail checkout systems offer the prospect of efficient and innovative development. This study proposes an enhanced lightweight YOLOv5 merchandise detection and recognition method. The method introduces SELayer and a multi-headed self-attentive module of Transformer in YOLOv5 to enable the network to focus more on essential factors such as commodities when performing retail merchandise detection, and improve the recognition performance of the model. Also, the Ghost module is introduced to reduce network parameters and computation, increase computation speed and reduce latency. We validated the performance of the approach on a public dataset. Compared with the existing YOLOv5 model, the model achieves a 0.9% improvement in detection accuracy and a 27.7% reduction in GFLOPs. With this study, we optimise the problem of small batch identification of retail goods, providing a basis for automated processing of intelligent retail supply and marketing systems with practical implications.

ORPTQ: An Improved Large Model Quantization Method Based on Optimal Quantization Range

2025-07-14T12:38:03+03:00

Quantization reduces model storage by representing model in low bits. It can help to improve the application capability of transformer-based large models and make them possible to be deployed on resource-limited systems such as PCs and mobile devices. The best weight-only quantization method currently is to use second-order information to fine-tune the weight step by step during the quantization process, compensating for the quantization errors that have occurred. The method can minimize the functional loss of weight due to quantization by adjusting the remaining elements through algebraic transformations in each step. However, the performance of this quantization method will deteriorate rapidly when the adjustment for weight deviates too far from the starting point, especially in low-bit quantization (e.g. 4 bits or fewer). To meet the mathematical prerequisite of this method in the quantization, this paper introduces two parameters $α, β$ to adjust the quantization range based on the second-order method, and presents three approaches to seek their optimal values. The experimental results show that the performance of the proposed method significantly outperforms the original second-order method in low-bit quantization. The code of this paper is available on github.com/t-scen/ORPTQ.

Integration of Explainable AI with Deep Learning for Breast Cancer Prediction and Interpretability

2025-07-14T12:38:04+03:00

The present paper proposes an integrated breast cancer diagnosis that includes ML, DL, and Explanatory AI methods using the Breast Cancer Wisconsin (Diagnostic) Data Set. We compare standard machine learning approaches, namely Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR), with more intricate techniques based on deep learning. Although ML models help understand the problem, a DL model may be more appropriate when the data’s dimensionality and complexity are huge. Addressing these limitations, we present a new Hybrid Explainable Attention Mechanism (HEAM) for DL models that utilise attention performance. This method is used in CNNS with saliency maps and Grad-CAM methods to provide clinical users with attention on parts of the input that the model is based upon in its predictions, such as characteristics of cell nuclei in images. Using the Breast Cancer Wisconsin dataset, the novel deep learning model with HEAM enhancement is tested against traditional ML models concerning breast cancer classification. The findings of this investigation provide evidence that HEAM not only boosts the prediction accuracy by 99.5% but also enhances the model by allowing for the provision of sound and visual attention that explicates the prediction made, thereby improving the clinical relevance of the model.

An Early Warning Model for Industrial Network Security Issues: A Crafted Strategy for High Accuracy Based on Machine Learning Approach

2025-07-14T12:38:04+03:00

An industrial network has become an important infrastructure. As industrial networks develop, their cybersecurity problems become more and more prominent. The attacks currently realized to networks turn out to be advancing quicker than ever, and their destructive force also continuously gets bigger. Thus, the available early warning technology for industrial network security issues requires more accuracy and timeliness since a serious amount of delays occurs in real cases. The article proposes a strategy with high accuracy based on a machine-learning algorithm. Nonlinear high-dimensional data with different feature characteristics in cyber-attacks and low training efficiency of conventional early warning models to predict attacks are underlined as a significant part of the problem to deal with. Thus, the manuscript suggests a feature selection method based on the Tuna Swarm Optimization (TSO) algorithm to filter out redundant features and reduce the data’s dimensionality. Then, the Extreme Learning Machine (ELM) and Auto-Encoder (AE) are combined to construct the model called Extreme Learning Machine-Auto Encoder (ELM-AE) to be implemented as the basis of the early warning model for industrial network security. Afterward, the improved Whale Optimization Algorithm (I-WOA) is used to optimize the parameters of the ELM, to construct the obtained optimization model. Finally, the obtained optimization model is applied to detect attacks on industrial cyber security systems as an early warning method. Eventually, the proposed model is tested by constructing an evaluation index system on how effective the early warning system functions. The experimental results show that the proposed warning model for industrial network security issues has high warning accuracy and efficiency concurrently, which provides an advanced early warning model for network attacks. The proposed model with 92.64% precision and 51.84 s average execution time excels over other methods.

Single-Pulse Detection Method of Radar Weak Target Based on a Two-Stage Deep Neural Network

2025-07-14T12:38:04+03:00

With the increasing prevalence of drones in low-altitude airspace, the radar detection of weak targets with a low signal-to-noise ratio (SNR) still poses a crucial challenge. Traditional constant false alarm rate (CFAR) methods encounter issues of high false alarms and low accuracy when the SNR is below-15dB. This paper puts forward a two-stage deep neural network to improve weak target detection by emulating human visual perception. In the first stage (coarse detection), potential targets are rapidly localized through grid-based regression. In the second stage (fine detection), depth-wise separable convolution (DSC) and residual connections are utilized for accurate classification. Experimental results show that, at an SNR of -20dB, the detection rate of the proposed method is 20% higher than that of CFAR methods, and the inference speed is 3.66 times faster than that of single-stage networks. Ablation studies confirm the efficiency improvements brought by the coarse detection network. This approach offers a robust solution for real-time drone surveillance in complex and cluttered environments.

SAEDF: A Synthetic Anomaly-Enhanced Detection Framework for Detection of Unknown Network Attacks

2025-07-14T12:42:11+03:00

Detecting unknown cyber-attacks (i.e., zero-day) is difficult because network environments change frequently and there are few labeled examples of anomalies. Traditional methods for detecting anomalies often struggle to handle unknown attack types and work effectively with complex, high-dimensional data. To overcome these problems, we propose a new approach called the synthetic attack-enhanced detection framework (SAEDF). SAEDF combines synthetic anomaly generation, flexible feature extraction, and unsupervised anomaly detection. The framework employs a model known as the adaptive and dynamic generative variational autoencoder (ADGVAE). This model generates realistic synthetic attacks and adapts its structure to work effectively with datasets of varying complexity. This helps the model work well with a wide range of attack patterns while still being efficient. Tests on benchmark datasets show that SAEDF performs better than other methods. It achieves higher scores for F1, Recall, and has a much lower rate of false positives. These results show that SAEDF is effective in finding unknown attacks, improving detection accuracy, and handling complex and changing network traffic.

Data-Fusion Based On Transfer Learning For Plant Disease Recognition

2025-04-01T07:21:43+03:00

In this paper, the research focused on wild and introduced cultivated flowers with multiple diseases such as Stephanitis, Sooty Mould, Xanthosis, and Leaf Blight, utilizing transfer learning and and data fusion technology to construct a plant disease detection model employing Faster R-CNN.The self-built data set collected during the flower growth cycle was trained and identified.To solve the problem of disease category imbalance in the actual collected data samples, the data of small category samples is enhanced from the perspective of category balance and label balance, and FocalLoss is used to improve the original classification loss function. Based on this self-built data set, the constructed IFRCNN disease detection model was compared with the SSD (Single Shot multibox Detector), ResNet18 and Yolov3 models. The results showed that for several common plant diseases in the dataset, the mAP of IFRCNN disease detection model was significantly higher than that of the other three models. It can effectively locate plant leaf disease areas, realize the detection of multiple diseases, and provide reference for accurate disease prevention and control.

MEA-IFE: An Improved Multi-modal Fusion Framework Based on DCNN-BERT-BiLSTM and Its Application in Sentiment Analysis

2025-07-14T12:38:04+03:00

In the real world, emotional data often comes from multiple heterogeneous sources, making it difficult for unimodal approaches to capture emotional information fully. Existing sentiment analysis models struggle with accuracy when handling complex emotional expressions. Accordingly, this paper proposes a multi-modal sentiment analysis framework, MEA-IFE, which is characterized by effective feature extraction and high predictive accuracy. To mitigate potential information loss and expression limitations in BERT-BiLSTM during text feature extraction, MEA-IFE introduces a parallel structure of SK-Net and BiLSTM, enhancing the ability to extract multi-dimensional text features. Additionally, it integrates the ECA mechanism to improve the capture of essential information in text. For image-related challenges, MEA-IFE incorporates Vision Transformer better to capture both global and detailed features of images, combining CNN and Transformer architectures. During the feature fusion phase, MEA-IFE employs a multi-head attention mechanism to dynamically integrate text and image features, exploring the interactive potential between different modalities. Experiments performed using the Kaggle text dataset and the FER2013 image dataset demonstrate an impressive accuracy of up to 98.00%, validating its effectiveness. When compared with models like AM-MF, AMSAER, HAN-CA-SA, and TBGAV, MEA-IFE shows outstanding performance across accuracy, precision, recall, and F1 score, with respective improvements of 0.40%, 0.20%, 0.75%, and 0.52%. The model also excels in the AUC metric, further confirming its advantages. The proposed MEA-IFE model possesses high predictive accuracy and strong feature integration capabilities, meeting the precision demands of complex multi-modal sentiment tasks.

A Lightweight Multi-Party Key Authentication Management Protocol Based on Cyber-Physical Systems

2025-04-01T07:21:42+03:00

In the era of digital healthcare, secure information interaction among users, gateways, and multiple devices in a cyber-physical system (CPS) is very important, but also very challenging. However, existing authentication schemes can only accomplish authentication between gateways and smart devices, and do not consider the authentication needs of gateways, users and multiple devices. In addition, users need to initiate multiple key authentication requests to complete multi-device authentication, which greatly increases the communication overhead and security risks. In response, this paper proposes a lightweight multi-party key authentication protocol based on cyber-physical system. On the basis of meeting the user, gateway and multi-device authentication requirements, the key authentication process is effectively simplified by the CPS architecture, and the user only needs to initiate a request to complete the three-party multi-device authentication, which greatly reduces the communication overhead, reduces the security risks, and improves the scheme's adaptability and generalization ability in largescale device scenarios. Finally, the mathematical analysis confirms the reliability of the proposed scheme and points out that the scheme reduces the computational and communication requirements compared with similar methods, which is crucial for CPSs with limited resources.

Driver Fatigue Detection Based on Multiple Physiological Signals and an Improved Deep Belief Network

2025-04-01T07:21:43+03:00

In order to accurately discriminate the driver fatigue, multiple physiological signals of 10 drivers were collected by a wireless body area network in actual driving, including neck electromyography (EMG) and electroencephalography (EEG). Then, the noises of signals were removed by several denoising methods, and 22 features were extracted, including energy entropy, multiscale entropy, and other relevant features. Subsequently, a deep belief network (DBN) was used to further extract multi-domain features. And then, a grey wolf optimization algorithm was used to optimize the performance of the DBN. The results showed that the accuracy of the model built in the present work was up to 96% in discriminating the fatigue states.

Synthetic Data Enhances Mathematical Reasoning of Language Models Based on Artificial Intelligence

2025-04-01T07:21:43+03:00

Current large language models (LLMs) training involves extensive training data and computing resources to handle multiple natural language processing (NLP) tasks. This paper endeavors to assist individuals to compose feasible mathematical question-answering (QA) language models in specific fields. We leveraged Gretel.ai, a feasible data generation platform, to generate high-quality mathematical QA data covering several areas, including definitions, theorems, and calculations related to linear algebra and abstract algebra. After fine- tuning through Open-AI infrastructure, GPT-3 performed significant improvements on accuracy, achieving a roughly 18.2% increase in abstract algebra benchmark, approximately 1.6x improvement on linear algebra theorems benchmark, and approximately 24.0% increase on linear algebra calculations benchmark. And small language models (SLMs) such as LLama-2-7B/13B and Mistral-7B have outstanding around 2x accuracy advancements in linear algebra calculations. This study demonstrates the potential for individuals to develop customized SLMs for specialized mathematical domains using synthetic data generation and fine-tuning techniques.

ADFN: Adaptive Dynamic Fusion Network for Real-time Multispectral Object Detection

2025-07-14T12:38:04+03:00

Multispectral object detection leverages the complementary strengths of infrared (IR) and visible (VIS) modalities to improve detection accuracy. However, existing approaches often lack adaptability to dynamic lighting conditions, or fail to achieve real-time performance due to complexity. We propose the Adaptive Dynamic Fusion Network (ADFN), a novel architecture that integrates adaptive multi-path computation and attention-guided feature fusion to address these challenges. ADFN incorporates the Collaborative and Alternating Attention (CAA) modules for efficient feature alignment and the Adaptive Dynamic Pathway (ADP) strategy to dynamically adjust computational pathways based on lighting conditions, optimizing the balance between accuracy and efficiency. Experiments on the FLIR2 and LLVIP datasets demonstrate that ADFN achieves superior mAP@50-95 and real-time performance, showcasing its robustness and efficiency across diverse environments. ADFN offers a practical solution for dynamic lighting conditions and resource-constrained multispectral object detection tasks.

Estimation and Recognition Methods of Human Gait Pose based on Computer Vision and Transformer

2025-04-01T07:21:44+03:00

Human gait pose estimation and recognition, as an emerging biometric technology, have advantages such as no need for target object cooperation, difficulty in forgery, and long-distance recognition. However, compared with traditional biometric special recognition, it is more susceptible to the influence of target object's arbitrary motion. In response to the above issues, the study introduces heterogeneous transfer learning to construct a human gait pose estimation and recognition method based on computer vision and Transformer, and improves it using the perspective gradually shift training method based on this. The research results indicated that the improved human gait pose estimation and recognition model had good recognition performance in 11 perspectives with intervals of 16° from 0° to 180°, and the corresponding change curve remained stable, with an average recognition rate of over 97%. The average initial validation rate of the improved model was 65.32% higher than before, and the maximum validation rate of the improved model achieved significant improvement from different angles. In comparison with other mainstream algorithms, the improved model proposed in the study had the highest average validation rate and average accuracy, which were 98.56% and 97.51%, respectively, and the corresponding average improvement index was greater than 20%. The above results confirm the performance and reliability of the research method, providing new solutions for the problem of human gait pose estimation and recognition in complex scenes.

Research on Real-time Detection of Pipeline Weld Defects Based on Lightweight Neural Networks

2025-04-01T07:21:43+03:00

In the field of pipeline weld defect detection, common object detection algorithms have high complexity and huge computational load, making it difficult to meet the real-time monitoring requirements of pipeline weld defects on pipeline production lines. To address this issue, this paper proposes a lightweight pipeline weld defect detection model YOLOv8-BVS based on the YOLOv8 object detection framework. The model introduces the BRA module to improve the recognition ability of small defects. To further improve the accuracy of model recognition, a lightweight upsampling algorithm CARAFE is used in the feature fusion network to improve the quality and richness of fused features. Finally, the experimental results showed that the model parameters were 1.56M, which was only 51.6% of the baseline, while the average accuracy reached 87.9%, an improvement of 3.4% compared to the baseline. This verified that the YOLOv8 BVS model met the requirements of online detection of pipeline weld defects while ensuring detection quality.

Improved YOLOv8n based lotus seedpod detection algorithm

2025-04-01T07:21:43+03:00

These Aiming at the influence of the shape appearance, color and growth environment of lotus seedling, lotus seedling detection exists problems such as low efficiency, low precision, leakage and misdetection, etc., an improved lotus seedling detection algorithm FSM-YOLOv8 is proposed based on the YOLOv8n model. First, the C2f-Faster module reduces the number of model parameters while ensuring the structural feature extraction capability of the YOLOv8n network. Then, the SimAM attention mechanism is applied to the model feature extraction module, which enhances the multi-scale and spatial feature extraction capability of the model. Finally, MPDIoU is used as the boundary loss function to effectively solve the problem of low detection rate caused by the spatial overlap and occlusion of the lotus seed pods and lotus leaves.The results show that the improved FSM-YOLOv8 achieves 84.8%, 84.1%, and 87.9% of detection accuracy, 84.1%, and 87.9% of recall, respectively, compared with the YOLOv8n model, and reduces 13.4% of the parameter amount. 13.4%, which is a significant improvement in detection accuracy and model lightweighting, and can realize rapid identification of lotus seedpods in complex environments, and meet the demand of real-time identification of lotus seedpod picking robots in the process of picking.

A Prediction Method for Highway Traffic Flow Based on the IHPO-VMD-LSTM-Informer Model

2025-07-14T12:38:04+03:00

Accurate and timely predictions of highway traffic flow are crucial for implementing intelligent highway management. This paper introduces a novel prediction approach for highway traffic flow by employing the IHPO-VMD-LSTM-Informer model, aiming at enhancing prediction accuracy. Initially, key indicators measuring highway traffic are identified, and Nonlinear Principal Component Analysis (NPCA) is applied to minimize the dimensionality and interdependence among these indicators. This reduction process replaces the original complex indicators with fewer numbers of principal components, thereby simplifying the feature matrix's structure. Subsequently, Variational Modal Decomposition (VMD) processes historical highway traffic flow data, enhanced by the strategically improved Hunter-Prey Optimization (HPO) algorithm. This optimization facilitates adaptive parameter adjustments for the VMD, enabling effective decomposition of highway traffic flow time series data. The Sample Entropy (SE) of Intrinsic Modal Functions (IMFs) from this decomposition is used with the substantial indicators to form a comprehensive feature matrix. Then, the predictive module combines a Long Short-Term Memory (LSTM) network with the Informer architecture to accurately predict highway traffic flow from the feature matrix. The effectiveness of the proposed model is verified using a public motorway traffic dataset KDD CUP 2017. The results indicate that the proposed model outperforms available ones in terms of prediction accuracy, where MAPE and RMSE have 8.09 and 2,84, thus significantly advancing intelligent highway management.

YOLOv8-SS: A Method of Localizing Soldiers in Intricate Battlefield Environments

2025-04-01T07:21:44+03:00

As combat becomes more autonomous and intelligent in the future, and effective military target localization techniques are essential to understanding operational military deployment and target tracking. In this paper, we offer an instance segmentation technique for precise soldier localization in intricate battlefield environments, called YOLOv8-SS. First, in the YOLOv8 backbone network, the C2f module is replaced by the DualC2f module, which we created based on DualConv in order to minimize the amount of parameter computation while maintaining accuracy. Second, the feature extraction network is enhanced by import the global attention mechanism (GAM), which increases the cross-dimensional interaction between the channel and spatial information and boosts the model's feature extraction performance. Lastly, the reparameterization module DBB is used to redesign the segmentation head of YOLOv8. Convolutional branches of various sizes and shapes are added to the network's feature representation capacity during the training phase. In the inference phase, the convolutional branches are equivalently replaced with regular convolutional, which increases accuracy while maintaining inference efficiency. Additionally, a dataset for segmenting soldier instances include various battlefield situations is provided in this paper, and experimental validation is carried out using this dataset. The experimental results demonstrate that YOLOv8-SS improves the Box P, Box mAP50, and Box mAP50-95 measures by 2.7%, 2.9%, and 5.1%, respectively, in comparison to the baseline model YOLOv8n. As a result, the YOLOv8-SS model performs more accurately when it comes to segmenting soldiers in intricate battlefield environments.

Embedding Numerical Features and Meta-Features in Tabular Deep Learning

2025-07-14T12:38:04+03:00

Tabular data is ubiquitous in real-world applications, and an increasing number of deep learning approaches have been developed for tabular data prediction. Among these approaches, embedding techniques serve as both a common and essential component. However, the design of tabular embedding paradigms remains relatively limited, and there is a lack of systematic evaluation regarding the performance of many existing methods in specific scenarios. In this paper, we focus on embedding numerical features and meta-features. To enrich the embedding methods for numerical features, we propose an ordering-oriented regularization technique applicable to piecewise linear embeddings, along with an unsupervised feature grouping method to facilitate partial embedding sharing. We demonstrate that these methods contribute to building more efficient and lightweight embedding modules. Importantly, we highlight ordering and sharing as two promising directions in the design of embeddings for numerical features. Additionally, we address several evaluation gaps: we assess the robustness of existing embeddings for numerical features and evaluate a set of general designs separately for data type embeddings and positional embeddings, providing insights into their practical applications and further developments.

Optimization of RED-PID controller using the chaotic-subpopulation strategy-based Aquila and Math algorithms

2025-04-01T07:21:44+03:00

While the Transmission Control Protocol (TCP) is essential for congestion control by adjusting packet sending rates, it falls short of resolving the buffer bloat problem in critical routers. In response, Active Queue Management (AQM) mechanisms, notably Random Early Detection (RED), have been proposed to construct a feedback system, TCP/RED, for congestion control. However, existing AQM controllers like RED lack comprehensive optimization of control parameters for adapting to dynamic network conditions effectively. In this study, we propose a novel heuristic algorithm (AOMOA), which combines the global exploration of Aquila Optimizer (AO) with the local exploitation of Math Optimizer (MO), to optimize AQM controller parameters within the TCP/RED feedback system. AOMOA leverages chaotic-subpopulation and dynamic k-worst shift strategies to ensure a balance between exploration and exploitation, thereby mitigating premature convergence. Additionally, we analyze RED's intrinsic flaw and, therefore introduce a Proportional-Integral-Derivative (PID) adjuster into RED, RED-PID, to overcome the limitation according to theory analysis. To optimize RED-PID parameters, we present an optimization model ensuring stability and sensitivity in congestion control. Comprehensive simulations demonstrate that RED-PID, optimized by AOMOA, outperforms the standard RED controller, showcasing superior congestion control performance.

Image Enhancement Model for Open-Pit Mine Monitoring Based on Parallel Multi-Scale Feature Fusion

2025-04-01T07:21:44+03:00

The workspace in open-pit mining systems often suffers from insufficient or uneven illumination due to spatial constraints and obstructions caused by large equipment or geotechnical structures, leading to degraded surveillance imagery and consequently impacting safety monitoring efforts. This study designed an open-pit mine surveillance image enhancement model based on a parallel multi-scale feature fusion Transformer to address the degradation of surveillance video images and leverage the superior expressive power of Transformer networks in visual image processing compared to other networks. The network architecture mainly processes and integrates full-size feature maps and various levels of downsampled feature maps in parallel, preserving both the semantic relationships of image elements and their overall structure. The downsampling process of the network aims to maximize the extraction and restoration of the luminance features of small-sized objects from low-resolution images. By integrating features from downsampling, full-size image processing effectively restores illumination, thereby enhancing the accuracy of the images. To reduce the computational demands of the Transformer structure and facilitate its application in monitoring imagery, we employed an orthogonal self-attention mechanism along both the rows and columns of the image to be processed. This mechanism shifts the network's computational demand from exponential to linear growth. During the training phase, the network model was trained using a dataset collected on-site to enhance the model's adaptability to field conditions. SSIM and PSNR test results confirm that this model performs exceptionally well in open-pit mining production systems.

Occluded Lane Line Detection with Deep Polynomial Regression in Global View

2025-04-01T07:21:44+03:00

Occluded Lane line detection method based on depth polynomial regression in global field of view is proposed for the problem of lane lines being obscured on driving road. In order to obtain better lane line feature representation capability, a dual attention mechanism module that connects spatial attention and channel attention in series is introduced to improve the network's ability to obtain lane line features, and then its feature information is used to adopt the lane line detection method of line-direction position classification by adding a line-by-line detection branch after the VGG backbone network to search lane line pixel points through line-direction scanning; in order to distinguish the lane line In order to distinguish which lane line the pixel points belong to, a loss function is designed according to the idea of metric learning, and a vector block is introduced on the semantic segmentation network to record the vector distance of the lane line pixels; finally, the pixels on the current lane line are extracted by the OPTICS clustering model, and a depth polynomial approach is used to complete the fitting of the lane line. Experiments are conducted on the Tusimple dataset, and the results show that compared with the LaneNet network, the method in this paper improves 4.79% and 6.34% in accuracy and precision, respectively, and has a better detection effect on the obscured lane lines.

Gas Hydrate Pipeline Is Optimized: Levy Flight, Cauchy Mechanism, and Perception Probability

2025-07-14T12:38:05+03:00

Pipelines used for the hydraulic lifting of gas hydrate particles in deep-sea gas hydrates consume a large quantity of energy, so the level of efficient resource exploitation is very low and it is challenging to meet an efficient gas supply. Therefore, the article aims to optimize and analyze a process used for rigid pipe hydraulic lifting, an essential part of a deep-sea gas hydrate extraction system. First, the objective function is constructed considering the relationship between the extraction system’s parameters, and a specific energy consumption is set when the deep-sea gas hydrate extraction is under consideration. Then, the range of each parameter is determined according to the extraction system's actual situation. Secondly, the improved crow search algorithm with a hybrid strategy covering dynamic perception probability, Levy flight, and Cauchy variation mechanism is employed to solve the optimization model. Finally, the improved crow search algorithm is applied to the experimental settings and compared with other optimization algorithms. The experimental results show that the proposed method, which is, the improved crow search algorithm, has a good computational efficiency, can effectively realize the optimization of the parameters of the deep-sea natural gas hydrate system, and is robust to numerical fluctuations of the parameters. Thus, the performance of the pipeline is improved and the energy consumption of the system is effectively reduced. Eventually, a theoretical reference is provided for the development of deep-sea gas hydrate. The proposed algorithm, I-CSA, can effectively deal with larger sample data and maintain high computational efficiency with fewer MAPE results when the sample sizes increase. Eventually, it is helpful for the deep exploitation and utilization of deep-sea gas hydrate.

Application of Intelligent Obstacle Avoidance Algorithm Combined with Internet of Things Technology in Navigation

2025-04-01T07:21:44+03:00

With the prosperity and development of the Maritime Silk Road, China's maritime industry has reached a new height. While the maritime transport industry has been vigorously developed, it has also brought great challenges to safe navigation. To realize intelligent navigation, effectively prevent maritime collision accidents, and improve navigation safety, a structural model of intelligent navigation obstacle avoidance platform based on Internet of Things technology is first proposed. Then the research combines the analytic hierarchy process, artificial neural network and BP neural network algorithm, and introduces environmental factors to design an optimized intelligent navigation obstacle avoidance algorithm, so that the algorithm can make real-time intelligent adjustment strategies according to the changes of the actual environment. Finally, the collision risk at the location of the research ship is judged, and the priority list of obstacle avoidance is constructed by the risk value between different ships and the research ship, providing reference for the pilot. The research results show that the prediction accuracy of I-INOA algorithm is 97.83%. In the two obstacle avoidance experiments, the decision-making efficiency of the four ships based on I-IONA algorithm is the highest, which is 1. In practical application, the priority list of obstacle avoidance is P, O and S2. In conclusion, I-INOA algorithm has better performance and practicability, enabling the research ship to respond more intelligently and quickly.

Multi-Dimensional Temporal Feature Fusion and Density Perception for Time Series Clustering

2025-04-01T07:21:44+03:00

In the field of data mining and knowledge discovery, clustering algorithms have emerged as a powerful tool for unsupervised learning. The adaptability and efficiency of these algorithms make them indispensable in a multitude of applications, including customer segmentation in marketing and anomaly detection in cybersecurity. However, when these clustering algorithms are applied to time series data, a number of distinctive challenges emerge. The representation of time series data, which is often vast and high-dimensional, requires the application of efficient techniques that reduce the dimensionality of the data while ensuring the preservation of vital information. Furthermore, existing clustering methods encounter difficulties when dealing with variable density distributions. In response to these challenges, we present the Density-based Clustering Model for Time Series (DCMD). This model seamlessly integrates temporal representation and clustering, ensuring efficiency and accuracy. Our Multi-dimensional Representation Fusion (MDR) method for time series retains critical features while reducing data dimensions. Furthermore, the K-Nearest Neighbor Weighted (NNW) clustering method enhances local density calculation. Rigorous benchmark evaluations validate the efficacy of our approach. Our contributions advance the field of time series clustering research and show promise for diverse applications.

Hybrid Attention Approach for Source Code Comment Generation

2025-07-14T12:38:05+03:00

Currently, developers are often obligated to enhance code quality. High-quality code is often accompanied with comprehensive summaries, including code documentation and function explanations, which are invaluable for maintenance and further development. Regrettably, few software projects provide sufficient code comments owing to the high costs associated with human labeling. Contemporary researchers in software engineering concentrate on the methods for automated comment generating. Initial algorithms depended on handwritten templates or information retrieval methods. With the advancement of machine learning, researchers construct automated models for machine translation instead. Nonetheless, the produced code comments remain inadequate owing to the significant disparity between code structure and normal language. This study introduces a unique deep learning model, At-ComGen, which utilizes hybrid attention for the automated creation of source code comments. Utilizing two separate LSTM encoders, our approach integrates essential tokens from source code functions with the code structure, represented by a corresponding Abstract Syntax Tree. In contrast to earlier data-driven models, our methodology utilizes code syntax and semantics in the generation of comments. The hybrid attention method, used for comment creation for the first time to our knowledge, enhances the quality of code comments. The tests demonstrate that At-ComGen is efficacious and surpasses other prevalent methodologies. Machine comments from Seq2Seq and CODE-NN disregard code structure underlying DeepCom and At-ComGen. At-ComGen has 59.3%, 36.4%, 43.3%, and 13.1% higher comment BLEU values than baseline models for a 5-line function. Even though model performance reduces with comment length, At-ComGen's comments often outperform others. 5–10-word machine comments work best. For reference length 10, At-ComGen has 38.2%, 23.7%, 9.3%, and 4.4% greater BLEU values than the other baseline models.