论文地址:Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting
BASTS:CFG用tree-LSTM做embedding给transformer
Abstract
本文展现了一种块级别的AST分割方法,将代码拆分为CFG的树状代码块,每个分割出的树用Tree-LSTM预训练获取局部非线性语法编码,再丢到transformer里面生成高质量代码摘要
Introduction
AST的深度一般非常大,导致一个heavy encoder的情况,并且训练困难,难以捕捉复杂的语义。此外基于GNN的方法也面临heavy encoder而且结果也没有很大的提升的问题
把代码根据CFG的块拆分,生成一个用Tree-LSTM建模的split AST,然后预训练任务是预测控制树中的下一个分割AST,最后结合分割AST的表征和token表征丢入transformer的
Overview
data:image/s3,"s3://crabby-images/fee2a/fee2aec131790cb3a146c8792fd2e2d115c789ed" alt="avatar"
Code Splitting
data:image/s3,"s3://crabby-images/850dd/850ddff5cab628ba57558c7dee5025c75dd4679f" alt="avatar"
根据method的CFG的dominator tree树中的block来分割一个method
Tree Encoding
data:image/s3,"s3://crabby-images/e3f84/e3f845c8b96bf82c0add5b6a998394dc53b786bd" alt="avatar"
Summarizing Code with Transformer
data:image/s3,"s3://crabby-images/fb500/fb5006927a36148ce259e42cced4a66f59f04a26" alt="avatar"
Experiment
data:image/s3,"s3://crabby-images/44afb/44afbae50f85f6e46568f26dd197d5e3f57b8248" alt="avatar"
data:image/s3,"s3://crabby-images/0c57f/0c57fe40644134640280ec36352c0d15fa97a69b" alt="avatar"