|
Upcoming Seminars and Events
|
| May 11, 2026 |
-
Title: Generative AI for Drug Discovery: From High-Resolution Proteomics to Autonomous Scientific Workflows
Time: 10:00am
Venue: CB308, 3/F, Chow Yei Ching Building, HKU (Zoom broadcasting)
Speaker(s): Dr. Elie Wolfe
Remark(s): Abstract
The integration of generative AI into drug discovery is moving beyond simple structure prediction toward a more comprehensive and autonomous pipeline. In this talk, I will focus on our recent efforts to accelerate AI-driven drug discovery (AIDD) through a multi-layered approach. I will first present our work on de novo protein and peptide sequencing, which enables the high-resolution data acquisition necessary for identifying novel targets. I will then delve into our core research on biomolecular structure prediction, discussing how we optimize these models for the specific challenges of therapeutic design. Finally, I will briefly explore how these generative tools are setting the stage for agentic science, where autonomous systems begin to orchestrate complex discovery workflows.
About the speaker
Siqi Sun is an associate professor at Fudan University and a researcher at the Shanghai AI Lab. He previously served as a researcher at Microsoft Research, Redmond. He holds a PhD from the Toyota Technological Institute at Chicago (TTIC) and a bachelor's degree in Mathematics from Fudan University. His research focuses on AI for science, specifically developing generative models and standardized benchmarks for proteomics and structural biology.

-
Title: From cross-modal alignment to hierarchical sharing: statistical foundations of contrastive learning for multimodal data
Time: 02:30pm
Venue: Room 301, Run Run Shaw Building
Speaker(s): Prof. Doudou Zhou
Remark(s): Abstract
"Multimodal data are increasingly common in modern biomedical and machine learning applications yet learning useful representations from heterogeneous modalities remains challenging. A central issue is that different modalities may contain complementary information, but the extent and pattern of information sharing can vary substantially across modalities. In this talk, I will present two recent works that develop statistical foundations for contrastive learning in multimodal settings. The first focuses on electronic health records and studies how structured clinical codes and unstructured clinical notes can be jointly embedded through a multimodal contrastive framework. This approach connects the contrastive objective to a pointwise mutual information matrix, yielding an interpretable and privacy-preserving algorithm based on summary level co-occurrence information. The second work moves beyond the conventional sharedversus-private decomposition and introduces a hierarchical framework that learns globally shared, partially shared, and modality-specific representations within a unified model. I will discuss the key modeling ideas, identifiability results, recovery guarantees, and implications for downstream prediction. Together, these works highlight how principled statistical modeling can improve both the interpretability and effectiveness of multimodal representation learning." plex discovery workflows.
About the speaker
Doudou Zhou is an Assistant Professor of Statistics & Data Science at the National University of Singapore. His research lies at the intersection of statistics, machine learning, and artificial intelligence, with a focus on statistical learning theory, multimodal data integration, electronic health records, and the evaluation of large language models. He develops principled methods for learning from noisy, heterogeneous, and partially observed data, with applications in biomedicine and modern AI systems.

|
| May 12, 2026 |
-
Title: Anti-concentration inequalities for the difference of maxima of gaussian random vectors
Time: 10:30am
Venue: Room 301, Run Run Shaw Building
Speaker(s): Prof. Shuting Shen
Remark(s): Abstract
We derive novel anti-concentration bounds for the difference between the maximal values of two Gaussian random vectors across various settings. Our bounds are dimension-free, scaling with the dimension of the Gaussian vectors only through the smaller expected maximum of the Gaussian subvectors. In addition, our bounds hold under the degenerate covariance structures, which previous results do not cover. In addition, we show that our conditions are sharp under the homogeneous component-wise variance setting, while we only impose some mild assumptions on the covariance structures under the heterogeneous variance setting. We apply the new anticoncentration bounds to derive the central limit theorem for the maximizers of discrete empirical processes. Finally, we back up our theoretical findings with comprehensive numerical studies.
About the speaker
Shen Shuting is an Assistant Professor of Statistics & Data Science at the National University of Singapore. Before joining NUS, she was a postdoctoral fellow at the Fuqua School of Business and the Department of Biostatistics & Bioinformatics at Duke University, jointly supervised by Dr. Alexandre Belloni and Dr. Ethan X. Fang. Prior to her postdoctoral position, she obtained her PhD in Biostatistics from Harvard University in 2023, where she was jointly supervised by Dr. Xihong Lin and Dr. Junwei Lu. She earned a B.A. and a B.S. in Mathematics (dual) from Peking University in 2018. Her research interests primarily include large-scale inference, combinatorial inference, choice model asymptotics, operations research theories, applied probability, and distributed computing.

|
| May 18, 2026 |
-
Title: Beyond LLMS: Architecting the systems backbone for semantic engines and agents
Time: 03:00pm
Venue: HW312, Haking Wong Building, HKU
Speaker(s): Dr. Fatma Özcan
Remark(s): Abstract
"Large Language Models (LLMs) are redefining analysis across structured and unstructured data, leading to the emergence of two primary architectural paradigms: AI or semantic engines, and data agents. Despite distinct approaches, both architectures encounter pivotal challenges, particularly in optimizing AI operators, agentic pipelines, natural language data interfaces, and AI-powered search. Centrally, embeddings and similarity search are key building blocks. This talk first addresses optimization for semantic operators, presenting an extensive evaluation of proxy models for AI query approximation. The findings demonstrate a greater than 100x cost and latency reduction for semantic filtering (AI.IF) and significant gains for semantic ranking (AI.RANK). Next, the talk examines Filtered Vector Search (FVS), a key component for semantic search and Generative AI (GenAI) applications in modern database systems. A central insight is that optimal algorithm selection is not determined solely by distance‑metric computation costs; rather, system‑level overheads play a substantial and decisive role. Finally, the talk highlights the discovery of relevant data sources as a major bottleneck and introduces a metadata reasoner agent to address this challenge."
About the speaker
"Fatma Özcan is a Principal Engineer at Systems Research@Google. Her current research focuses on GenAI and data management, vector search, platforms and infra-structure for large-scale data analysis, and natural language interfaces to
data. Dr Özcan got her PhD degree in computer science from University of Maryland, College Park, and her BSc degree in computer engineering from METU, Ankara. Before joining Google, she was a Distinguished Research Staff Member and a senior manager at IBM Almaden Research Center. She has over 24 years of experience in industrial research, and has delivered core technologies into various IBM and Google products. She is the co-author of the book ""Heterogeneous Agent Systems"", and co-author of several conference papers and patents. She is an ACM Fellow and serves on the CRA board of directors, and she is the co-chair of CRA-Industry. She received the VLDB Women in Database Research Award in 2022."

|