Dissertation Defense Schedule
Sharing original dissertation research is a principle to which the University of Delaware is deeply committed. It is the single most important assignment our graduate students undertake and upon completion is met with great pride.
We invite you to celebrate this milestone by attending their dissertation defense. Please review the upcoming dissertation defense schedule below and join us!
PROGRAM | Financial Services Analytics
Abstractive Text Summarization via Contextual Semantics Understanding
Automatic text summarization is the task of generating a precise text snippet to capture the most relevant and critical information from an input document. It is one of the central problems, commonly seen as a critical component in many Natural Language Processing (NLP) tasks. Text summarization is challenging because it often involves both text understanding and generation. Researchers have been putting tremendous efforts to develop various models, primarily falling into two categories: extractive summarization and abstractive summarization. The former selects a subset of sentences from the input document as the summary, and the latter relies on natural language generation techniques to generate an abstract representation. This dissertation focuses on abstractive summarization. We combine the insights from literature and the key advantages of Transformer architecture to address two key challenges in abstractive summarization.
The first part of this dissertation is devoted to better capturing the input document’s global semantics for summarization. Transformer-based architecture is proven to be better at exploring the relationships among local tokens, but the semantic understanding at a higher level (e.g., sentences, topics) is usually under-explored. To address this challenge, this dissertation presents a novel framework that uses topics to guide language generation. Using latent topics, our model can preserve the global semantics and guide the generation of summaries, thereby improving the performance. This dissertation presents a joint learning framework to incorporate neural topic modeling into the seq2seq model. Our approach outperforms previous state-of-the-art models in both quantitative and human evaluation.
The second part of this dissertation focuses on improving the denoising ability of a seq2seq model through fine-tuning. Deep neural networks are often brittle, especially when deployed in real-world systems since they are not robust to inevitable noises in data. This dissertation presents a framework for the seq2seq model to enhance its denoising ability. We incorporate self-supervised contrastive learning along with various sentence-level document augmentation. Experimental results show that our proposed model achieves state-of-the-art performance and is more robust to noises.
The last part of this dissertation introduces our participation in podcast summarization. As a new field of research, podcast summarization is challenging because the podcasts are usually conversational and colloquial. This dissertation proposes a framework for podcast summarization: we first present a baseline analysis to understand the unique challenges from the podcast and then introduce a two-stage generation pipeline. We first locate the most relevant content from the noisy transcript and then generate the summary based on the selected sentences. Evaluating by professional assessors, the generated summaries from our proposed approach successfully capture the key information and the main topics in the episode.
To summarize, this dissertation formulates a suite of Transformer-based seq2seq solutions to improve abstractive summarization. It presents a new framework to overcome the limitations of existing abstractive summarization models. The effectiveness of the proposed methods has been demonstrated on different datasets using highly competitive benchmarks. It also provides impactful research findings in abstractive summarization and language generation.