ACL-2016 Summarizing Source Code using a Neural Attention Model

2021-12-17 PaperNote CL, SE 0 0 Comments Word Count: 448(words) Read Count: 1(minutes)

论文地址：Summarizing Source Code using a Neural Attention Model

CODE-NN，早期比较重要的一个Baseline

1. 简介

主要用LSTM+Attention机制
为了减轻人工写summary的负担，本文提出一个完全数据驱动的code summary生成的模型，叫做CODE-NN。该模型的基本架构是LSTM with attention。本文的贡献点在于：
提出了一个数据驱动的模型CODE-NN，在code summarization和code retrieval两个任务上取得了sota的结果。
贡献了一个code summary的数据集，该数据集是从StackOverflow上收集的，并经过一系列清洗等处理，最终的数据集规模如下图。
<img src="https://img-blog.csdnimg.cn/2021011920042011.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3RpbmcwOTIy,size_16,color_FFFFFF,t_70#pic_center" alt="avatar" style="zoom:50%;" />

2. 模型

上图是根据代码生成自然语言summary的过程，模型使用的是LSTM，最开始输入LSTM的是NULL，接着根据代码c和当前LSTM输出的隐状态向量计算attention，得到attention之后，形成新的表示向量，softmax得到该时刻应该输出的自然语言的token，再将该token作为下一时刻的输入，重复以上过程，直至输出END。

模型包含两个主要的部分，一个是LSTM学习自然语言token基于上下文的表示，一个是attention。

3. 实验

在实验部分，对于code summary的任务，比较的baseline方法是：

IR: An information retrieval baseline that outputs the title associated with the code c_j in the training set that is closest to the input code c in terms of token Levenshtein distance.
MOSES (2007): A popular phrase-based machine translation system.
SUM-NN (2015): A neural attention-based abstractive summarization model.

评价指标包括：

METEOR
BLEU-4
Human Evaluation

对于code retrieval任务，比较的baseline方法是：

RET-IR: An information retrieval baseline

评价指标包括：

本文链接： https://tyang816.github.io/2021/12/17/Summarizing Source Code using a Neural Attention Model/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

Yang Tan

Master Student @ECUST

ACL-2016 Summarizing Source Code using a Neural Attention Model

CODE-NN，早期比较重要的一个Baseline

1. 简介

2. 模型

3. 实验

Yang TanMaster Student @ECUST