Channel Embedding for Informative Protein Identification from Highly Multiplexed Images

Salma Abdel Magid¹
sabdelmagid@g.harvard.edu Won-Dong Jang¹
wdjang@g.harvard.edu Denis Schapiro²
denis_schapiro@hms.harvard.edu

Donglai Wei¹
donglai@g.harvard.edu James Tompkin³
james_tompkin@brown.edu Peter Sorger²
peter_sorger@hms.harvard.edu

Hanspeter Pfister¹
hpfister@g.harvard.edu

1: Dept. of Computer Science, Harvard University
2: Dept. of Systems Biology, Harvard Medical School
3: Dept. of Computer Science, Brown University

Code [GitHub] bioRxiv 2020 [Paper]

Figure 1: Informative channel identification. (a) Given highly multiplexed imaging data, we train (b) a neural network to encode a channel embedding and classify a label (e.g., tumor grade). Then, we measure (c) the classification task channel importance by adopting an interpretation method to the channel embedding. (d) We evaluate our system by comparing the predicted informative channels to expert knowledge, and provide new insights for clinicians and pathologists.

Abstract

Interest is growing rapidly in using deep learning to classify biomedical images, and interpreting these deep-learned models is necessary for life-critical decisions and scientific discovery. Effective interpretation techniques accelerate biomarker discovery and provide new insights into the etiology, diagnosis, and treatment of disease. Most interpretation techniques aim to discover spatially-salient regions within images, but few techniques consider imagery with multiple channels of information. For instance, highly multiplexed tumor and tissue images have 30-100 channels and require interpretation methods that work across many channels to provide deep molecular insights. We propose a novel channel embedding method that extracts features from each channel. We then use these features to train a classifier for prediction. Using this channel embedding, we apply an interpretation method to rank the most discriminative channels. To validate our approach, we conduct an ablation study on a synthetic dataset. Moreover, we demonstrate that our method aligns with biological findings on highly multiplexed images of breast cancer cells while outperforming baseline pipelines.

Highlights

(1) We introduce a novel system to automatically identify informative channels in highly multiplexed tissue images and to provide interpretable and potentially actionable insight for research and clinical applications.

(2) In our experimental results, we demonstrate that our system outperforms conventional algorithms combined with interpretation techniques on the informative channel identification task for assessing tumor grade.

(3) The informative channels identified by our novel method align with findings from a single cell data analysis, even though our approach does not require single cell segmentation.

Salma Abdel Magid¹ sabdelmagid@g.harvard.edu	Won-Dong Jang¹ wdjang@g.harvard.edu	Denis Schapiro² denis_schapiro@hms.harvard.edu
Donglai Wei¹ donglai@g.harvard.edu	James Tompkin³ james_tompkin@brown.edu	Peter Sorger² peter_sorger@hms.harvard.edu
	Hanspeter Pfister¹ hpfister@g.harvard.edu
	1: Dept. of Computer Science, Harvard University 2: Dept. of Systems Biology, Harvard Medical School 3: Dept. of Computer Science, Brown University