I previously worked as a research intern at Tiktok/ByteDance research SAMI (Speech Audio Music and Language Intelligence) team and Microsoft in ROAR (Research and OpenAI team).
abstract = "Human evaluation remains the gold standard for assessing abstractive summarization. However, current practices often prioritize constructing evaluation guidelines for fluency, coherence, and factual accuracy, overlooking other critical dimensions. In this paper, we investigate argument coverage in abstractive summarization by focusing on long legal opinions, where summaries must effectively encapsulate the document`s argumentative nature. We introduce a set of human-evaluation guidelines to evaluate generated summaries based on argumentative coverage. These guidelines enable us to assess three distinct summarization models, studying the influence of including argument roles in summarization. Furthermore, we utilize these evaluation scores to benchmark automatic summarization metrics against argument coverage, providing insights into the effectiveness of automated evaluation methods."
",
title = "ReflectSumm: A Benchmark for Course Reflection Summarization",
author = "Zhong, Yang and Elaraby, Mohamed and Litman, Diane and Butt, Ahmed
Ashraf and Menekse, Muhsin",
editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and
Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on
Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
abstract = "This paper introduces ReflectSumm, a novel summarization dataset
specifically designed for summarizing students' reflective writing. The goal of ReflectSumm is
to facilitate developing and evaluating novel summarization techniques tailored to real-world
scenarios with little training data, with potential implications in the opinion summarization
domain in general and the educational domain in particular. The dataset encompasses a diverse
range of summarization tasks and includes comprehensive metadata, enabling the exploration of
various research questions and supporting different applications. To showcase its utility, we
conducted extensive evaluations using multiple state-of-the-art baselines. The results provide
benchmarks for facilitating further research in this area.",
abstract = "This paper presents an overview of the ImageArg shared task, the
first multimodal Argument Mining shared task co-located with the 10th Workshop on Argument
Mining at EMNLP 2023. The shared task comprises two classification subtasks - (1) Subtask-A:
Argument Stance Classification; (2) Subtask-B: Image Persuasiveness Classification. The former
determines the stance of a tweet containing an image and a piece of text toward a
controversial topic (e.g., gun control and abortion). The latter determines whether the image
makes the tweet text more persuasive. The shared task received 31 submissions for Subtask-A
and 21 submissions for Subtask-B from 9 different teams across 6 countries. The top submission
in Subtask-A achieved an F1-score of 0.8647 while the best submission in Subtask-B achieved an
F1-score of 0.5561.",
abstract = "We propose a simple approach for the abstractive summarization
of
long legal opinions that takes into account the argument structure of the document. Legal
opinions often contain complex and nuanced argumentation, making it challenging to generate a
concise summary that accurately captures the main points of the legal opinion. Our approach
involves using argument role information to generate multiple candidate summaries, then
reranking these candidates based on alignment with the document{'}s argument structure. We
demonstrate the effectiveness of our approach on a dataset of long legal opinions and show
that
it outperforms several strong baselines.",
abstract = "A challenging task when generating summaries of legal documents is the ability to address their argumentative nature. We introduce a simple technique to capture the argumentative structure of legal documents by integrating argument role labeling into the summarization process. Experiments with pretrained language models show that our proposed approach improves performance over strong baselines.",
Mentorship: For struggling graduate students in the middle east, specially the ones who need help with their thesis, contact me with subject title [research discussion] with a concise description of your research and what type of help you request