Hybrid Attention Approach for Source Code Comment Generation
DOI:
https://doi.org/10.5755/j01.itc.54.2.36699Keywords:
Code Comment Generation, Attention, Abstract Syntax TreeAbstract
Currently, developers are often obligated to enhance code quality. High-quality code is often accompanied with comprehensive summaries, including code documentation and function explanations, which are invaluable for maintenance and further development. Regrettably, few software projects provide sufficient code comments owing to the high costs associated with human labeling. Contemporary researchers in software engineering concentrate on the methods for automated comment generating. Initial algorithms depended on handwritten templates or information retrieval methods. With the advancement of machine learning, researchers construct automated models for machine translation instead. Nonetheless, the produced code comments remain inadequate owing to the significant disparity between code structure and normal language. This study introduces a unique deep learning model, At-ComGen, which utilizes hybrid attention for the automated creation of source code comments. Utilizing two separate LSTM encoders, our approach integrates essential tokens from source code functions with the code structure, represented by a corresponding Abstract Syntax Tree. In contrast to earlier data-driven models, our methodology utilizes code syntax and semantics in the generation of comments. The hybrid attention method, used for comment creation for the first time to our knowledge, enhances the quality of code comments. The tests demonstrate that At-ComGen is efficacious and surpasses other prevalent methodologies. Machine comments from Seq2Seq and CODE-NN disregard code structure underlying DeepCom and At-ComGen. At-ComGen has 59.3%, 36.4%, 43.3%, and 13.1% higher comment BLEU values than baseline models for a 5-line function. Even though model performance reduces with comment length, At-ComGen's comments often outperform others. 5–10-word machine comments work best. For reference length 10, At-ComGen has 38.2%, 23.7%, 9.3%, and 4.4% greater BLEU values than the other baseline models.
Downloads
Published
Issue
Section
License
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.