Causal machine learning for single-cell genomics

Causal machine learning for single-cell genomics

  • McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).

    Article 

    Google Scholar 

  • van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  • Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hetzel, L. et al. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Proceedings of the 36th International Conference on Neural Information Processing Systems 26711–26722 (Curran Associates, 2022).

  • Liu, J. et al. Towards out-of-distribution generalization: a survey. Preprint at (2021).

  • Sekhon, J. The Neyman–Rubin model of causal inference and estimation via matching methods. In The Oxford Handbook of Political Methodology (eds Box-Steffensmeier, J. M. et al.) Ch. 11 (Oxford Academic, 2008).

  • Imbens, G. W. & Rubin, D. B. Causal Inference in Statistics, Social, and Biomedical Sciences (Cambridge University Press, 2015).

  • Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Qiao, L., Khalilimeybodi, A., Linden-Santangeli, N. J. & Rangamani, P. The evolution of systems biology and systems medicine: from mechanistic models to uncertainty quantification. Annu. Rev. Biomed. Eng. (2025).

  • Wen, Y. et al. Applying causal discovery to single-cell analyses using CausalCell. eLife 12, e81464 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Belyaeva, A., Squires, C. & Uhler, C. DCI: learning causal differences between gene regulatory networks. Bioinformatics 37, 3067–3069 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tam, G. H. F., Chang, C. & Hung, Y. S. Gene regulatory network discovery using pairwise Granger causality. IET Syst. Biol. 7, 195–204 (2013).

    Google Scholar 

  • Ke, N. R. et al. DiscoGen: learning to discover gene regulatory networks. Preprint at bioRxiv (2023).

  • Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Fleck, J. S. et al. Inferring and perturbing cell fate regulomes in human brain organoids. Nature 621, 365–372 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bravo González-Blas, C. et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat. Methods 20, 1355–1367 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Santos-Zavaleta, A. et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 47, D212–D220 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Peters, J., Janzing, D. & Scholkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (MIT Press, 2017).

  • Lopez, R., Hutter, J.-C., Pritchard, J. & Regev, A. Large-scale differentiable causal discovery of factor graphs. Neural Inf. Process. Syst. abs/2206.07824, 19290–19303 (2022).

    Google Scholar 

  • Chevalley, M., Roohani, Y., Mehrjou, A., Leskovec, J. & Schwab, P. CausalBench: a large-scale benchmark for network inference from single-cell perturbation data. Preprint at (2022).

  • Wang, Y., Solus, L., Yang, K. D. & Uhler, C. Permutation-based causal inference algorithms with interventions. Neural Inf. Process. Syst. 30, 5822–5831 (2017).

    Google Scholar 

  • Aliee, H., Kapl, F., Hediyeh-Zadeh, S. & Theis, F. J. Conditionally invariant representation learning for disentangling cellular heterogeneity. Preprint at (2023).

  • Levine, M. & Davidson, E. H. Gene regulatory networks for development. Proc. Natl Acad. Sci. USA 102, 4936–4942 (2005).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lazar, N. H. et al. High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR–Cas9 editing. Nat. Genet. 56, 1482–1493 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Adikusuma, F. et al. Large deletions induced by Cas9 cleavage. Nature 560, E8–E9 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Tsuchida, C. A. et al. Mitigation of chromosome loss in clinical CRISPR–Cas9-engineered T cells. Cell 186, 4567–4582 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bunne, C. et al. Learning single-cell perturbation responses using neural optimal transport. Nat. Methods 20, 1759–1768 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Heumos, L. et al. Pertpy: an end-to-end framework for perturbation analysis. Preprint at bioRxiv (2024).

  • Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16, 715–721 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rainforth, T., Foster, A., Ivanova, D. R. & Bickford Smith, F. Modern Bayesian experimental design. Stat. Sci. 39, 100–114 (2024).

    Article 

    Google Scholar 

  • Jain, M. et al. GFlowNets for AI-driven scientific discovery. Digit. Discov. 2, 557–577 (2023).

  • Williams, C. & Rasmussen, C. Gaussian processes for regression. In Advances in Neural Information Processing Systems (eds Touretzky, D. et al.) 514–520 (MIT Press, 1995).

  • Gal, Y. & Ghahramani, Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. ICML 48, 1050–1059 (2015).

    Google Scholar 

  • Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems 405, 6402–6413 (2017).

  • Lahlou, S. et al. DEUP: direct epistemic uncertainty prediction. Trans. Mach. Learn. Res. (in the press).

  • Ke, N. R. et al. Learning neural causal models from unknown interventions. Preprint at (2019).

  • Deleu, T. et al. Bayesian structure learning with generative flow networks. In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence 518–528 (2022).

  • Močkus, J. On Bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference Novosibirsk (ed. Marchuk, G. I.) 400–404 (Springer, 1975).

  • Toth, C. et al. Active Bayesian causal inference. Adv. Neural Inf. Proc. Syst. 35, 16261–16275 (2022).

    Google Scholar 

  • Scherrer, N. et al. Learning neural causal models with active interventions. Preprint at (2021).

  • Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

    Google Scholar 

  • Tran, K. et al. Computational catalyst discovery: active classification through myopic multiscale sampling. J. Chem. Phys. 154, 124118 (2021).

    CAS 

    Google Scholar 

  • Kim, S. et al. Deep learning for Bayesian optimization of scientific problems with high-dimensional structure. Preprint at (2021).

  • Bertin, P. et al. RECOVER identifies synergistic drug combinations in vitro through sequential model optimization. Cell Rep. Methods 3, 100599 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tosh, C. et al. A Bayesian active learning platform for scalable combination drug screens. Nat. Commun. 16, 156 (2025).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2022).

    Article 
    PubMed Central 

    Google Scholar 

  • Türei, D., Korcsmáros, T. & Saez-Rodriguez, J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nat. Methods 13, 966–967 (2016).

    Article 
    PubMed 

    Google Scholar 

  • Lobentanzer, S. et al. Democratizing knowledge representation with BioCypher. Nat. Biotechnol. 41, 1056–1059 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bertin, P. et al. Analysis of gene interaction graphs as prior knowledge for machine learning models. Preprint at (2019).

  • Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Stein-O’Brien, G. L. et al. Enter the matrix: factorization uncovers knowledge from omics. Trends Genet. 34, 790–805 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Piran, Z., Cohen, N., Hoshen, Y. & Nitzan, M. Disentanglement of single-cell data with biolord. Nat. Biotechnol. 42, 1678–1683 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article 

    Google Scholar 

  • Ahuja, K., Mahajan, D., Wang, Y. & Bengio, Y. Interventional causal representation learning. Proc. 40th Intl Conf. Mach. Learn. 202, 372–407 (2023).

    Google Scholar 

  • Varici, B., Acarturk, E., Shanmugam, K., Kumar, A. & Tajer, A. Score-based causal representation learning with interventions. Preprint at (2023).

  • Michael, B. & Karaletsos, T. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. Adv. Neural Inf. Proc. Syst. 36, 1–12 (2023).

    Google Scholar 

  • Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25, 337–350 (2023).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lopez, R. et al. Learning causal representations of single cells via sparse mechanism shift modeling. Proc. Mach. Learn. Res. 213, 1–30 (2023).

    Google Scholar 

  • Kartik, A., Hartford, J. S. & Bengio, Y. Weakly supervised representation learning with sparse perturbations. Adv. Neural Inf. Process. Syst. 35, 15516–15528 (2022).

    Google Scholar 

  • Peters, J., Bauer, S. & Pfister, N. in Causal Models for Dynamical Systems. Probabilistic and Causal Inference: The Works of Judea Pearl 1st edn. 671–690 (Association for Computing Machinery, 2022).

  • Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Moon, K. R. et al. Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr. Opin. Syst. Biol. 7, 36–46 (2018).

    Article 

    Google Scholar 

  • Aliee, H., Theis, F. J. & Kilbertus, N. Beyond predictions in neural ODEs: identification and interventions. Preprint at (2021).

  • Hananeh, A. et al. Sparsity in continuous-depth neural networks. Adv. Neural Inf. Process. Syst. 35, 901–914 (2022).

    Google Scholar 

  • Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Tong, A. et al. Trajectorynet: A dynamic optimal transport network for modeling cellular dynamics. Internatl Conf. Mach. Learn. (PMLR, 2020).

  • Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Eyring, L. V. et al. Modeling single-cell dynamics using unbalanced parameterized Monge maps. Preprint at bioRxiv (2022).

  • Wu, Y. et al. PerturBench: benchmarking machine learning models for cellular perturbation analysis. Preprint at (2024).

  • Csendes, G., Szalay, K. Z. & Szalai, B. Benchmarking a foundational cell model for post-perturbation RNAseq prediction. Preprint at bioRxiv (2024).

  • Mehrjou, A. et al. GeneDisco: a benchmark for experimental design in drug discovery. Preprint at (2021).

  • Metzner, E., Southard, K. M. & Norman, T. M. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. Cell Syst. 16, 101161 (2025).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Sethuraman, M. G. et al. NODAGS-Flow: nonlinear cyclic causal structure learning. In International Conference on Artificial Intelligence and Statistics (eds Ruiz, F. et al.) 6371–6387 (PMLR, 2023).

  • Nguyen, T., Tong, A., Madan, K., Bengio, Y. & Liu, D. Causal inference in gene regulatory networks with GFlowNet: towards scalability in large systems. Preprint at (2023).

  • Tung, K.-F., Pan, C.-Y., Chen, C.-H. & Lin, W.-C. Top-ranked expressed gene transcripts of human protein-coding genes investigated with GTEx dataset. Sci. Rep. 10, 16245 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dhamija, S. & Menon, M. B. Non-coding transcript variants of protein-coding genes — what are they good for? RNA Biol. 15, 1025–1031 (2018).

    Google Scholar 

  • Aebersold, R. et al. How many human proteoforms are there? Nat. Chem. Biol. 14, 206–214 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Dubey, A. et al. The Llama 3 herd of models. Preprint at (2024).

  • Gavrilov, A. A. et al. Studying RNA–DNA interactome by Red-C identifies noncoding RNAs associated with various chromatin types and reveals transcription dynamics. Nucleic Acids Res. 48, 6699–6714 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Noh, J. Y. et al. CCIDB: a manually curated cell–cell interaction database with cell context information. Database 2023, baad057 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pearce, A. C. et al. Vav1 and Vav3 have critical but redundant roles in mediating platelet activation by collagen. J. Biol. Chem. 279, 53955–53962 (2004).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *