Week 2: The Great (RAM) War
March 17, 2023
The Great (RAM) War
Hi! As you can probably tell from the title, my T.S. references are only going to go downhill from here. But hey, if you’re here, you must’ve been able to fearlessly get through my first post without seeing red, so that’s gotta count for something, right? Anyways, on to the good stuff:
Stain Deconvolution
As I mentioned in last week’s post, the histopathology slides I’ll be using in my project are stained with hematoxylin and eosin (H&E), a stain/counterstain pair in which the former stains cell nuclei purple/blue and the latter stains cytoplasm pink/red (though not quite maroon). While brainstorming potential methods to eliminate pen markings from histopathology images with my external advisor, she introduced me to the topic of stain deconvolution, and after doing some research of my own, I realized that the stain deconvolution process was perfect for use in my project.
Essentially, stain deconvolution involves digitally separating colored stains from images by transforming stained histopathology images from the RGB color space into multiple stain channels. In the specific context of my project, when applied to images with pen markings, this would allow me to separate the hematoxylin and eosin channels from the original image and recombine them into an image that (theoretically) has no pen markings. The resulting image would retain all vital tissue data from the original image and would be fit for use in deep learning models.
Here’s an example of what the results of successful stain deconvolution would look like, with the original image on the left, hematoxylin on the top right, and eosin on the bottom right:
Data Source
Before I go into the progress I’ve made on this labyrinth of a project in the past week, I’d like to briefly discuss the dataset I’ll be using. All of the histopathology images I’ll be processing and analyzing in this project are slides of glioblastoma (a type of brain cancer) obtained from The Cancer Genome Atlas (TCGA). TCGA is a project of the National Institutes of Health and National Cancer Institute that provides a host of open-access data for 33 cancer types. As I noted in last week’s post, I was inspired to embark on this project while analyzing TCGA’s glioblastoma data, so I found it fitting to use the same dataset for this project.
Initial Exploratory Analysis
To begin exploring the applications of stain deconvolution in the context of my data, I loaded a glioblastoma slide with a particularly egregious amount of pen markings into QuPath, a bioimage analysis software. Here’s what the slide looked like:
After isolating the hematoxylin and eosin stain channels and viewing them separately, I then decided to visualize the residual image, which is what the slide would contain without H&E channels (i.e., the results of stain deconvolution). Here’s what I saw:
Success! I was able to isolate the pen markings on the slide. However, for my project, I intend to create a computational image pipeline for pen marking removal, so I wouldn’t be able to include existing tools like QuPath in my solution. With this in mind, and after discussing further with my external advisor, I started researching code-based approaches for stain deconvolution.
HistomicsTK & Memory Issues
HistomicsTK is a Python package used to analyze pathology images and is the main tool I chose to work with. I began by implementing the standard unsupervised color deconvolution algorithm in Google Colaboratory:
After verifying that it functioned properly on sample data provided by HistomicsTK, I then began to work on applying the algorithm to the glioblastoma data I’ll be using in this project. To do so, I downloaded a sample slide from TCGA and used Pyvips (a Python binding of vips, a popular image processing library) to convert the file (originally in the proprietary .svs format) to the more accessible TIFF file format:
Then, I used the Tifffile library to convert my newly created tiff image into a numpy array, which is the input format of the stain deconvolution algorithm I implemented above. Although I did face my fair share of bugs and issues (and dare I say, glitches) throughout this process, it had been relatively smooth sailing…that is, until I tried to perform stain deconvolution on the numpy array representation of the glioblastoma slide.
I was originally running this code on a base Google Colab notebook, which gives me access to 12.68 GB of RAM. When I ran the stain deconvolution algorithm, I quickly realized that this amount of memory was not enough:
I sighed, loaded a high-RAM notebook that gives me access to double (~25 GB) my existing RAM, and re-ran my existing code, thinking that the issue would be resolved with the extra memory. Alas, this was not the case–I saw the dreaded out-of-RAM message again and my notebook crashed. Surprised and a tad bit frustrated, I resolved to connect to a high-RAM notebook with access to a premium GPU, giving me a whopping 83 GB of RAM. Confident that my code would successfully execute this time, I ran my code again while anxiously monitoring RAM usage with eyes open (had to drop in this reference after the re-recorded version was released yesterday). But that still didn’t prove to be enough, and my notebook crashed yet again, displaying the same error message that my dreams will now be haunted by. Thus, the great (RAM) war.
After meeting with my external advisor and talking through this issue, I’ve decided that I will devote the next week to finding ways to circumvent this excessive RAM usage. First, I’ll try using an even smaller glioblastoma slide (smaller slide –> less RAM used to process it) to determine whether the same problem remains. If that isn’t successful, I will try separating a single slide into several smaller images (called “patches”) and try running my code on individual patches to verify its functionality. As a last resort, I could also try clean(ly) implementing raw stain deconvolution algorithms from scratch (without HistomicsTK) to see whether that method utilizes less memory.
Next week, I hope to discuss which approach worked best for me and go over my current progress. See you then, and thanks for reading! :))
Citations
- Alsubaie, Najah, Et Al. “Stain Deconvolution Using Statistical Analysis Of Multi-Resolution Stain Colour Representation.” PLOS ONE, Vol. 12, No. 1, 2017, Https://Doi.Org/10.1371/Journal.Pone.0169875.
- Bhattacharjee, Subrata, Et Al. “Cluster Analysis Of Cell Nuclei In H&E-Stained Histological Sections Of Prostate Cancer And Classification Based On Traditional And Modern Artificial Intelligence Techniques.” Diagnostics 12.1 (2021): 15.
- “The Cancer Genome Atlas.” Genomic Data Commons, National Cancer Institute, Https://Portal.Gdc.Cancer.Gov/.
- “Color Deconvolution.” HistomicsTK Documentation, HistomicsTK, Https://Digitalslidearchive.Github.Io/HistomicsTK/Examples/Color_deconvolution.Html.