Integrating Natural Language and Program Structure Information to Improve Software Search and Exploration

Author : Hill, Emily
Date : 30 Aug 2010
Advisor : Pollock, Lori
Institution : University of Delaware
Department : Computer and Information Sciences
Keyword(s) : Natural language program analysis, SWUM, code search, code navigation
Document Type : Ph.D. Thesis

Abstract :

Today’s software is large and complex, with systems consisting of millions of lines of code. New developers to a software project face significant challenges in locating code related to their maintenance tasks of fixing bugs or adding new features. Developers can simply be assigned a bug and told to x it|even when they have no idea where to begin. In fact, research has shown that a developer typically spends more time locating and understanding code during maintenance than modifying it. We can significantly reduce the cost of software maintenance by reducing the time and e ort to nd and understand the code relevant to a software maintenance task. In this dissertation, we demonstrate how textual and structural information in source code can be used to improve software search and exploration tools. To facilitate integration of this information into additional software tools, we present a novel model of word usage in software. This model provides software engineering tool designers access to both structural and linguistic information about the source code, where previously only structural information was available. We utilize textual and structural information to improve software search and program exploration tools, and evaluate against competing state of the art approaches. Our evaluations show that combining textual and structural information can outperform competing state of the art techniques. Finally, we outline uses of the model to improve software engineering tools beyond program search and exploration.

Paper Link