🎓Yue Yu
🏛️Georgia Institute of Technology
📅Wednesday, December 4th, 12:00 PM - 1:00 PM
📍1447 Classroom Klaus
📚Talk Title: Data-centric Approaches for Building Retrieval-augmented Language Models
📝Abstract
Although large language models (LLMs) have demonstrated impressive performance, they often struggle with truthfulness and hallucination. Retrieval augmentation addresses these issues by grounding responses in relevant evidence, reducing hallucination, and enabling efficient handling of long-tail knowledge without costly model retraining. This talk delves into data-centric approaches for building retrieval-augmented language models. It highlights two key contributions: COCO-DR, a BERT-scale retrieval model leveraging contrastive pretraining and robust optimization for enhanced generalization, and RankRAG, a framework for instruction-tuning LLMs to unify context ranking and answer generation in knowledge-intensive tasks. These works emphasize robust retrieval model training and efficient post-training pipelines, aiming to advance trustworthy and scalable RAG systems.
👤Bio
Yue Yu recently earned his Ph.D. from the Georgia Institute of Technology, where he worked with Prof. Chao Zhang. His research focuses on language modeling and natural language processing, with a particular interest in data-centric approaches to building robust and trustworthy models. Yue has served as an area chair for top NLP conferences, including ACL, EMNLP, and NAACL, and has received notable honors such as the NeurIPS 2023 Scholar Award and the ML4H 2022 Best Paper Award. More information about Yue’s research can be found at
his Google Scholar profile.
🎓Jifan Zhang
🏛️University of Wisconsin-Madison
📅Tuesday, October 29th, 2024 | 12:30 PM - 1:30 PM (EDT)
📍C341 Classroom in Van Leer
📚Talk Title: Learning from Black-Box General Intelligences
📎Slides
📝Abstract
General intelligences, both human and artificial (e.g. LLMs), offer remarkable flexibility in handling diverse tasks. However, directly leveraging these general intelligences at scale is prohibitively expensive. This raises the key question of how we can efficiently train lightweight, specialized models for specific applications by learning from and distilling the knowledge of black-box general intelligences. In this talk, I will discuss the label-efficient learning paradigms that have been developed over the past two decades, covering techniques in active learning, semi-supervised learning, and transfer learning. I will highlight scenarios and approaches that have proven empirically effective for label-efficient learning, including fine-tuning large pretrained models, uncertainty sampling and handling class imbalance. I will conclude by discussing the challenges and growing importance of label-efficient learning in an open-world scenario. This talk will provide an overview of the key ideas, results, and open problems in learning efficiently from black-box general intelligences.
👤Bio
Jifan Zhang is a Ph.D. candidate in computer science at the University of Wisconsin, working with Robert Nowak. He obtained his M.S. and B.S. degrees in computer science from the University of Washington. During that time, he was advised by Kevin Jamieson, Lalit Jain, Tanner Schmidt, Dieter Fox, and Zachary Tatlock. His research focuses on both applied and theoretical perspectives of Machine Learning, primarily on alignment of LLMs, humor generation with LLMs, and efficient distillation of black box intelligence.