Automatic Generation of Natural Language Summaries for Java Classes

Author : Moreno, Laura; Aponte, Jairo; Sridhara, Giriprasad; Marcus, Andrian; Pollock, Lori; Vijay-Shanker, K.
Date : May 2013
Publisher : IEEE
Journal : International Conference on Program Comprehension (ICPC)
Keyword(s) : code summarization; natural language analysis; documentation generation
Document Type : Article

Most software engineering tasks require developers to understand parts of the source code. When faced with unfamiliar code, developers often rely on (internal or external) documentation to gain an overall understanding of the code and determine whether it is relevant for the current task. Unfortunately, the documentation is often absent or outdated. This paper presents a technique to automatically generate human readable summaries for Java classes, assuming no documentation exists. The summaries allow developers to understand the main goal and structure of the class. The focus of the summaries is on the content and responsibilities of the classes, rather than their relationships with other classes. The summarization tool determines the class and method stereotypes and uses them, in conjunction with heuristics, to select the information to be included in the summaries. Then it generates the summaries using existing lexicalization tools. A group of programmers judged a set of generated summaries for Java classes and determined that they are readable and understandable, they do not include extraneous information, and, in most cases, they are not missing essential information.

