Driven by Curiosity, Grounded in Impact: His Journey Across the Frontiers of Multimodality

Home / News / Details

Date：27/04/2026 Article：Wang Chuxi Photo：From interviewee

Chai Wenhao is a graduate of the Class of 2023 in Civil Engineering at the Zhejiang University–University of Illinois Urbana-Champaign Institute (ZJUI). From his early undergraduate work in pedestrian pose estimation and computer vision, through firsthand exposure to cutting-edge industrial research at Microsoft Research Asia, to advanced studies in multimodal learning and video understanding in Seattle, he has continuously expanded the boundaries of his research. Driven by a mission to translate advanced technology into real-world impact, he now pursues his PhD at Princeton University.

His journey has been marked by both challenges and breakthroughs, but one conviction has remained unshakable: to harness frontier technology for meaningful, lasting change. What began as curiosity has evolved into a lifelong commitment to solving fundamental problems in the field.

His four years at ZJUI were transformative. More than just the starting point of his academic career, ZJUI is where he first embraced his identity as a researcher. The Computer Vision course, taught by ZJUI Assistant Professor Wang Gao’ang, revolutionized his understanding of machine learning. It demystified the field for him, and instilled a core principle: impactful research demands probing underlying mechanisms, not just surface-level applications.

Thanks to ZJUI’s practice-driven educational philosophy, which encourages undergraduates to join labs early, Chai entered Assistant Professor Wang Gao’ang’s research group in his sophomore year. Working on foundational projects like person re-identification and pedestrian pose estimation, he transitioned from simply implementing systems to formulating original research questions. This experience cultivated the rigor and critical thinking skills essential for academic excellence, laying the groundwork for his future success.

His decision to pursue graduate study abroad evolved gradually. Initially, he considered entering industry directly, but as he delved deeper into multimodal research, he realized he needed a broader platform to advance his work. A junior-year internship at Microsoft Research Asia proved to be a pivotal turning point. There, he gained a humbling awareness of gaps in his research vision and problem-framing skills compared to leading global researchers. Most importantly, he redefined what high-quality research means: it is not about chasing incremental benchmark improvements, but about tackling problems with long-term significance and real-world relevance.

He went on to earn his master’s degree at the University of Washington, where he deepened his expertise in video understanding. During this period, Chai spearheaded landmark projects including MovieChat and AuroraCap, with findings published in top-tier computer science venues: CVPR, TPAMI, and ICLR. He quickly emerged as a rising star in long-form video understanding and co-chaired the CVPR Long-form Video Understanding Challenge, demonstrating exceptional leadership on the global academic stage.

▲

In fall 2025, Chai began his PhD in computer science at Princeton University, where he centers his work on long-context multimodal modeling and reasoning, a core frontier in intelligent technology. He has observed that multimodal intelligence is shifting from a focus on feasibility to reliability: the field now grapples with how to robustly process ultra-long, multimodal real-world scenarios that demand deep reasoning. Encoding-perception and decoding-generation remain the two fundamental bottlenecks limiting progress.

▲

To tackle these challenges, he has led a series of groundbreaking studies, most notably the VideoNSA project accepted at ICLR 2026. This work is the first to systematically integrate Native Sparse Attention into video-language models, reliably extending the context window to 128K tokens, equivalent to over 10 hours of video, using just 3.6% of the attention budget. It delivers state-of-the-art results across multiple long-video understanding benchmarks, offering a transformative solution to the persistent computational explosion problem in long-context modeling.

Beyond model architecture innovations, Chai has made significant contributions to large language model (LLM) evaluation. As a core contributor, he helped develop the LiveCodeBench Pro benchmark suite, co-created by researchers from leading universities worldwide, which has emerged as one of the most authoritative platforms for assessing LLMs’ deep algorithmic reasoning abilities. Related findings have been published in top venues including NeurIPS 2025 and ICLR 2026.

▲

Reflecting on his six-year cross-cultural journey from ZJUI to Princeton, Chai has redefined what it means to be a globally competitive engineer. "It is not just about English fluency or publishing top papers," he explained. "It is about mastering strong research and engineering fundamentals, cultivating independent critical thinking, and maintaining sharp insight and unwavering focus on core frontier problems."

Nearly three years after graduating, Chai remains deeply connected to ZJUI. Whenever he comes across ZJUI students’ names in Google Scholar or arXiv, he feels a profound sense of pride and connection. What impresses him most is the Institute’s accelerating talent development, more and more undergraduates now publish top conference papers and earn offers from leading global universities. Watching his junior students flourish on the same academic path he once traveled, he takes immense pride in ZJUI’s extraordinary growth and its role in nurturing the next generation of global researchers. Going forward, may Chai Wenhao stay true to the spirit of Seeking Truth and Pursuing Innovation, breaking new boundaries in his academic exploration and life’s journey, and let the spirit of a ZJUIer shine on a wider world stage.