Predicting Incident Bacterial Vaginosis with Machine Learning
Longitudinal microbiome modeling to predict bacterial vaginosis before clinical onset.
Overview
Bacterial vaginosis (BV) is a common vaginal condition with high recurrence rates and limited ability to predict disease before symptoms appear. This project developed machine-learning models that identify incident BV weeks prior to clinical diagnosis using longitudinal vaginal microbiome data.
The focus was early risk stratification in real patients, rather than retrospective classification.
Approach
- Longitudinal cohort with dense, patient-level sampling before BV onset
- Vaginal microbiome profiling using 16S rRNA sequencing
- Artificial neural networks trained on temporal microbial features
- Built-in explainability using SHAP-based feature attribution
- Parallel analysis of cohort-level trends and individual patient trajectories
Models were designed to learn how microbiomes evolve over time leading into disease.
Key Findings
- Predictive signal emerged well before clinical BV diagnosis
- Model performance remained robust across heterogeneous baseline microbiomes
- Feature attribution highlighted specific taxa and temporal shifts driving risk
- Patient-level explanations revealed multiple mechanistic paths to BV
Outputs
-
U.S. Provisional Patent Application No. 63/778,989
Predicting Bacterial Vaginosis Development Using Artificial Neural Networks
Inventors: Jacob Elnaggar, Christopher Taylor, John Lammons, Christina Muzny
Filed March 27, 2025 - First-author manuscripts in preparation
- Conference presentations and invited talks
- Modeling framework reused in downstream dysbiosis and cancer-related projects
Clinical Relevance
Early identification of BV risk enables preventive interventions, personalized monitoring strategies, and a generalizable framework for predicting other microbiome-associated diseases. This work established a foundation for translating longitudinal microbiome data into clinically actionable tools.
Tools & Methods
Python · TensorFlow · SHAP · scikit-learn · pandas · longitudinal modeling · reproducible pipelines