论文地址:Learning Sequential and Structural Information for Source Code Summarization
mAST+GCN编码结构信息,Transformer编码序列信息
Abstract
用mAST+GCN编码结构信息,然后将序列的AST节点经过Transformer编码
Introduction
为了更好地表示结构信息,提出了修改AST,同层次间添加了兄弟边来表示相邻的块
Model
Representing Code as mAST
data:image/s3,"s3://crabby-images/0a7c3/0a7c3fdef53cae0a8700ff71ad83a5f21cf00e71" alt="avatar"
对于 Java 代码是添加同层次相邻边
data:image/s3,"s3://crabby-images/f919d/f919d234e535bdd12b44e8b7ba5bb463dc7994e5" alt="avatar"
对于 Python 代码作者发现函数名太重要了,就增加了“函数名”节点,然后加边
Proposed Model
data:image/s3,"s3://crabby-images/40877/40877b2ac201fa7eab0b95f168e5f4d93382f9ee" alt="avatar"
作者指出 mAST 能够捕捉相邻块信息, GCN 能够将结构相邻节点的表示生成得在语义空间中比较相近,transformer 能够捕获长距离相同块中的依赖信息
Experiment
data:image/s3,"s3://crabby-images/d0607/d0607ae4fcb6bd7523be3c79e1375781ca68d2aa" alt="avatar"
data:image/s3,"s3://crabby-images/c1bcd/c1bcd9869fd54b223d11a85b9789c926a6a9f634" alt="avatar"
消融实验和数据如上
data:image/s3,"s3://crabby-images/07011/07011a4c824e7b2c7aea761dfcc58ee85487dd5a" alt="avatar"
作者还做了实验将 GCN放在 Transformer 编码器前后位置的影响,其实放哪,还是前后都放差距不大,放前面感觉还好一点
data:image/s3,"s3://crabby-images/57689/57689022459fbdcb75c44beb08bce189d007f749" alt="avatar"
以及 GCN 层数的实验