Artificial intelligence (AI) applied to single-cell data has the potential to transform our understanding of biological systems by revealing patterns and mechanisms that simpler traditional methods miss. Here, we develop a general-purpose, interpretable AI pipeline consisting of two deep learning models: the Multi- Input Set Transformer++ (MIST) model for prediction and the single-cell FastShap model for interpretability. We apply this pipeline to a large set of routine clinical data containing single-cell measurements of circulating red blood cells (RBC), white blood cells (WBC), and platelets (PLT) to study population fluxes and homeostatic hematological mechanisms. We find that MIST can use these single-cell measurements to explain 70-82% of the variation in blood cell population sizes among patients (RBC count, PLT count, WBC count), compared to 5-20% explained with current approaches. MIST’s accuracy implies that substantial information on cellular production and clearance is present in the single-cell measurements. MIST identified substantial crosstalk among RBC, WBC, and PLT populations, suggesting co-regulatory relationships that we validated and investigated using interpretability maps generated by single-cell FastShap. The maps identify granular single-cell subgroups most important for each population’s size, enabling generation of evidence-based hypotheses for co-regulatory mechanisms. The interpretability maps also enable rational discovery of a single-WBC biomarker, “Down Shift”, that complements an existing marker of inflammation and strengthens diagnostic associations with diseases including sepsis, heart disease, and diabetes. This study illustrates how single-cell data can be leveraged for mechanistic inference with potential clinical relevance and how this AI pipeline can be applied to power scientific discovery.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis work was supported by NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science, a DeepMind Fellowship, and NIH Awards R01HL148248, R01DK123330, and R01HD104756.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Mass General Brigham Institutional Review Board approved the study and waived the requirement for informed consent.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityData produced in the present study are available upon reasonable request to the authors. Data from patients is not available due to privacy reasons.
Comments (0)