An Empirical Study of Identifier Splitting Techniques

Author : Hill, Emily; Binkley, David; Lawrie, Dawn; Pollock, Lori; Vijay-Shanker, K.
Date : Aug 2013
Publisher : Springer Link
Journal : Empirical Software Engineering
Keyword(s) : software engineering tools, program comprehension, identifier names, source code text analysis
Document Type : Article

Abstract :

Researchers have shown that program analyses that drive software development and maintenance tools supporting search, traceability and other tasks can benefit from leveraging the natural language information found in identifiers and comments. Accurate natural language information depends on correctly splitting the identifiers into their component words and abbreviations. While conventions such as camel-casing can ease this task, conventions are not well-defined in certain situations and may be modified to improve readability, thus making automatic splitting more challenging. This paper describes an empirical study of state-of-the-art identifier splitting techniques and the construction of a publicly available oracle to evaluate identifier splitting algorithms. In addition to comparing current approaches, the results help to guide future development and evaluation of improved identifier splitting approaches.

Paper Link