Automatically Mining Software-Based, Semantically-Similar Words from Comment-Code Mappings

Author : Howard, Matthew J.; Gupta, Samir; Pollock, Lori; Vijay-Shanker, K.
Booktitle : The 10th Working Conference on Mining Software Repositories (AWARDED the conference Best Research Paper Award)
Date : May 2013
Publisher : IEEE
Keyword(s) : semantically-similar words, mining, program understanding, code search
Document Type : In Conference Proceedings
BibTeX Entry : (show)

Abstract :

Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch in vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.

Paper Link

Presentation Link