Automatically Generating Natural Language Descriptions for Object-related Statement Sequences

Authors: X Wang, L. Pollock, and K. Vijay-Shanker
Booktitle : IEEE 24th International Conference on Software Analysis, Evolution and Reengineering
Date : 2017
Publisher : SANER
Project : 
Keywords: abstraction, mining code patterns, documentation generation

Abstract :

Current source code analyses driving software maintenance tools treat methods as either a single unit or a set of individual statements or words. They often leverage method names and any existing internal comments. However, internal comments are rare, and method names do not typically capture the method’s multiple high-level algorithmic steps that are too small to be a single method, but require more than one statement to implement.
Previous work demonstrated feasibility of identifying high level actions automatically for loops; however, many high level actions remain unaddressed and undocumented, particularly sequences of consecutive statements that are associated with each other primarily by object references. We call these object-related action units.
In this paper, we present an approach to automatically generate natural language descriptions of object-related action units within methods. We leverage the available, large source of high-quality open source projects to learn the templates of object-related actions, identify the statement that can represent the main action, and generate natural language descriptions for these actions. Our evaluation study of a set of 100 object-related statement sequences showed promise of our approach to automatically identify the action and arguments and generate natural language descriptions.

Paper Link