A Highly-Efficient, Scalable Pipeline for Fixed Feature Extraction from Large-Scale High-Content Imaging Screens
Gabriel Comolet, Neeloy Bose,
Jeff Winchell, Alyssa Duren-Lubanski, Tom Rusielewicz, Jordan Goldberg, Grayson Horn,
Daniel Paull, and
Bianca Migliori
iScience, 2024
Applying artificial intelligence (AI) to image-based morphological profiling cells offers significant potential for identifying disease states and drug responses in high-content imaging (HCI) screens. When differences between populations (e.g., healthy vs. diseased) are unknown or imperceptible to the human eye, large-scale HCI screens are essential, providing numerous replicates to build reliable models and accounting for confounding factors like donor and intra-experimental variations. As screen sizes grow, so does the challenge of analyzing high-dimensional datasets in an efficient way while preserving interpretable features and predictive power. Here, we introduce ScaleFEx℠, a memory-efficient, open-source Python pipeline that extracts biologically meaningful features from HCI datasets using minimal computational resources or scalable cloud infrastructure. ScaleFEx can be used together with AI models to successfully identify phenotypic shifts in drug-treated cells and rank interpretable features, and is applicable to public datasets, highlighting its potential to accelerate the discovery of disease-associated phenotypes and new therapeutics.