报告题目:Integrative analysis of 16S marker-gene and shotgun metagenomic sequencing data improves efficiency of testing microbiome hypotheses
报告人:胡懿娟(美国艾默里(Emory)公共卫生学院)
报告时间:2023年6月2日10:00-11:00
报告地点:文波楼201教室
摘要:The most widely used technologies for profiling microbial communities are 16S marker-gene sequencing and shotgun metagenomic sequencing. Surprisingly, many microbiome studies have performed both experiments on the same cohort of samples. The two datasets often yield consistent patterns in taxonomic profiles, highlighting the potential for an integrative analysis to improve power of testing these patterns. However, each dataset is subject to distinct experimental biases that systematically distort the measurements from their actual values in an experiment-specific manner. These experimental biases, together with partially overlapping samples and differential library sizes between the two datasets, pose tremendous challenges when combining the datasets. In this article, we introduce the first method, named LOCOM-I, for such an integrative analysis. The new method is based on our LOCOM model (Hu et al., 2022, PNAS), which employs logistic regression for testing differential abundance of taxa while remaining robust to experimental bias. Our new method combines data from both experiments for differential abundance tests, while accounting for differential experimental biases, assigning adaptive weights to each observation, and accommodating samples and taxa unique to an experiment. To benchmark the performance of the new method, we introduce two ad hoc approaches: applying LOCOM to pooled taxa count data and combining LOCOM p-values from analyzing each dataset separately. We demonstrate the uniform superiority of the new method through extensive simulation studies. An application to two real studies uncovered scientifically plausible findings that would have been missed by analyzing individual datasets.
报告人简介:胡懿娟,美国艾默里(Emory)公共卫生学院生物统计与生物信息系教授,在北京大学数学科学学院概率统计系获得学士学位(2005)和在美国北卡教堂山大学获得生物统计学博士学位(2011)。致力于开发生物统计学中高维度、高噪声组学数据的统计理论和方法,特别针对微生物组数据和遗传数据中的高维假设检验、稳健推测、缺失/偏差数据等问题。代表工作发表于Journal of American Statistical Association (JASA) 、Proceedings of the National Academy of Sciences(PNAS) 、Microbiome、 American Journal of Human Genetics (AJHG) 等期刊。