Week 1: A Blank Space
A Blank Space
Hi, I’m Garv! Welcome to the first week of my senior project blog posts (and, unfortunately, also the first week of my embarrassingly bad Taylor Swift references). But without further ado, let’s dive into my project!
To begin, I’d like to provide a brief introduction to histopathology images, given that they form the basis of my project. During surgery on a patient, when surgeons discover a tissue region that they would like to analyze further outside of the operating room, they often extract a biopsy of that tissue segment from the patient, process it, and fix it onto glass slides. During processing, these tissue sections are dyed with one or more specialized stains and counter-stains for aid in microscopic visualization.
Among available stain and counter-stain pairs, the gold standard is Hematoxylin-Eosin (H&E) staining, in which hematoxylin stains cell nuclei blue and eosin stains cytoplasm & connective tissue pink. This contrast is clearly visible in the sample histopathology image below.
While originally intended to be used in clinical environments, scans of these histopathology images have been widely used as a data source in research involving artificial intelligence and deep learning, bringing me to discuss my own research background and motivations for conducting this project.
Research Background & Project Intro
Since October 2021, I’ve been working at the University of Pennsylvania’s Artificial Intelligence in Biomedical Imaging Lab under Professor Spyridon Bakas and Dr. Bhakti Baheti, the latter of whom is my amazing external advisor for this project. During my work at the lab and with Dr. Baheti specifically, I’ve had the opportunity to collaborate on multiple research endeavors involving the use of whole slide histopathology images (WSIs). Indeed, it was through my work on a project where I was using WSIs of glioblastoma, a type of brain cancer, that I encountered my inspiration for this project.
Through a cursory analysis of the WSIs in my glioblastoma dataset, I discovered that over 10% of the slides (a non-insignificant amount!) contained some form of pen markings on top of the tissue. After exploring further, I found that these markings are drawn by pathologists to indiciate regions of interest while slides are being scanned. While such markings may be useful from a clinical perspective, I realized through my research that they significantly interfere with the performance of computational deep learning models, as demonstrated below.
In this example, a raw slide and the same slide with a model-generated heatmap overlay indicating regions that were most predictive (“high attention”) of tumor development are juxtaposed. Here, we can see that the model assigned a red overlay to the blue pen markings on the slide (which is why the resulting color looks purple) and assigned relatively cooler colors to most tissue regions.
In short, what this means is that the model found the pen markings, instead of the actual tissue, to be most predictive of cancer progression. Given that this is not an isolated occurence, this is very problematic and has detrimental effects on model performance.
I was further surprised to learn that while several researchers have attempted to tackle this issue, there is no effective solution to accurately and efficiently remove these pen markings from slides, leaving a blank space (I know, I cringed at it too) in existing research. As a result, researchers like me are forced to completely exclude slides from pen markings in their analyses, markedly reducing dataset size. For rare diseases and other conditions where data is limited, removing additional slides from already-small datasets could have drastic consequences for researchers’ ability to produce clinically meaningful findings.
Using stain deconvolution techniques, my goal is to create an end-to-end image processing pipeline that automatically obtains WSIs from publicly available sources, identifies the presence of pen markings on slides, removes markings from relevant slides while preserving tissue data, and prepares slides for use in deep learning models. Long story short (this is the last one for today, I promise), my project’s success will allow researchers across the globe to stop excluding entire slides just because of pen markings, potentially unlocking new medical findings and saving countless lives.
How exactly will I do so? Well, you’ll have to stay tuned till my next blog post to find out!
- “Hematoxylin And Eosin (H&E).” MyPathologyReport.Ca, 24 Nov. 2022, Https://Www.Mypathologyreport.Ca/Pathology-Dictionary/Hematoxylin-And-Eosin/.