论文地址:A Survey of Deep Learning Models for Structural Code Understanding
综述:深度学习结构化代码理解
Abstract
代码理解的方法和应用数量在不断增加,本文把近几年代码理解的文章分为两组:基于序列和基于图的模型。同时也介绍了一些指标,数据集和下游任务,并对未来的结构化代码理解领域做出建议
Introduction
代码结构建模:怎么有效地在代码里建模结构信息,怎么对特定的下游任务选择有效的结构信息
代码通用表征学习:怎么学习超越语言限制的代码表征
代码特定任务适应:对下游任务怎么选择特有架构,怎么处理特定任务数据,怎么在few-shot做,迁移学习,跨语言场景中适应模型
Preliminary
Structures in code
data:image/s3,"s3://crabby-images/bade6/bade69831645ad724a45ed7f668eaf1ee82cad2a" alt="avatar"
data:image/s3,"s3://crabby-images/328c6/328c65ede9e39eccb9ae3c4b915b6edab520104d" alt="avatar"
词法分析得到NCS,语法分析得到AST,再语义分析和中间代码生成得到CFG和DFG
Other structures
Intermediate Representation
从编译器中获取比如Static Single Assignment (SSA),或者Program Dependency Graph (PDG)
The Unified Modeling Language
软件系统的UML图
Sequence-based models
data:image/s3,"s3://crabby-images/6e509/6e509f2bf3f60daa119f767abbd7ab0ef824ca81" alt="avatar"
type-1:深度优先遍历
type-2:AST路径
type-3:结构信息添加
type-4:AST部分保留
Tasks
data:image/s3,"s3://crabby-images/78062/78062a19ae5a142d7b229b49a2d3c00abdfaea4d" alt="avatar"
data:image/s3,"s3://crabby-images/5d2cb/5d2cb7053ced624bd793b4d6f483dc7d6a745c7d" alt="avatar"
data:image/s3,"s3://crabby-images/f1787/f178777b31e9bd808cd166d11b7c2d9176150516" alt="avatar"
data:image/s3,"s3://crabby-images/65917/659179f51551a243772c94199d12f1bc55a8e7f6" alt="avatar"
data:image/s3,"s3://crabby-images/e514e/e514eed0bd76d55ac2f4fe7101707b19b2946685" alt="avatar"
data:image/s3,"s3://crabby-images/fd242/fd242bd027ee6f09f88e4a5b9473f72e41d3374d" alt="avatar"
data:image/s3,"s3://crabby-images/868a4/868a4da25c3b75537c24698187edb29cfc9d4370" alt="avatar"
data:image/s3,"s3://crabby-images/bb918/bb918f4f7f281a8a0acf344c3bd352244362c4e4" alt="avatar"