Week 8: My Lavender Haze Escapades
April 30, 2023
My Lavender Haze Escapades
Before you read any further, I regret to inform you that while I am very proud of the (half?-) rhyming title of this post, it is probably nowhere near as exciting or captivating as the title may suggest. A few days ago, I earned my CPR certification, so if you do end up passing out from boredom while reading this (I promise I won’t take offense), let me know and I’ll come right over to practice my newfound skills.
But hey, you never know–after all, as they say, beauty is in the eye of the beholder.
Tied Together With a Smile (and a couple of frowns)
I must say, I’m not the biggest fan of the music in Taylor’s debut album, but it gave me this perfect section title, so I gotta give credit where credit is due.
Last week, I was able to finally achieve the goal I was working towards throughout this entire project: removing pen markings from an H&E slide while preserving the original data underneath the markings. This week, I was mainly working on tying everything together into a fully-fledged end-to-end pipeline. While I did encounter my fair share of issues while doing so, I found myself enthusiastically pushing through them while in the lavender haze of last week’s accomplishments.
The first step in my pipeline is to source slides from TCGA. My code for doing so is very long, so for the sake of preserving what little sanity I have left, I’ll just be discussing my specific use case for testing the pipeline. Here (most of) it is in all its glory:
While I was building and testing this pipeline, I tried using the slide that I was using in these past few weeks to create and perfect my pen marking removal code. Here’s how I used the above code I wrote to source this slide from TCGA’s database:
As you can see, my code prompts the user to provide the specific slides they want to download, the download path, and a confirmation to ensure that the previous inputs were accurate. In this case, I downloaded my slide to the tcga_raw folder in my virtual notebook environment.
Now that I’ve downloaded the raw slide in SVS format, my next step is to convert it to the more accessible TIFF format to make the image data easier to work with. However, I needed to ensure that unlike the code I’ve been writing thus far, nothing was hardcoded to make my code generalizable to any number of slides. To do so, here’s what I wrote:
In this code section, I’m iterating over all raw SVS files, converting each one to TIFF format, and saving them in a separate folder called tcga_tiff. In this case, since I only had one slide, I could’ve just converted it to TIFF in a couple of lines of code, but since I wanted to ensure that I’d be able to use the same code for testing with multiple slides, I made sure to make it as clean and generalizable to all use cases as possible.
With my newly converted TIFF image saved, it was time for patch extraction. As I’ve mentioned in past posts, the tool I’ve been using for patch extraction is CLAM, a whole slide image processing toolkit created by a lab at Harvard that I’ve worked with in the past. And so that’s what I began implementing in this pipeline, starting with downloading CLAM and installing its required dependencies:
Then, I defined my settings for patch extraction and extracted patches from my histopathology image:
As I’ve mentioned before, CLAM stores patches in a very unique way: in order to promote computational and storage efficiency, it stores patch coordinates but not the actual image patches themselves. Since I’ll need to run my pen marking removal algorithms on all patches in this image, I’ll need the full list of patch coordinates that were extracted from this image. So I wrote code to do just that:
Here, coords_dict is a Python dictionary that stores the full set of patch coordinates for each image being processed. Here’s a sample of what the patch coordinates look like for my particular slide:
So then I thought: all I have to do is load each patch one by one with their individual coordinates, perform pen marking removal on them, and replace the processed patch with the original patch in the slide, right? After writing the code to do so, it turns out this task isn’t as simple as it might seem. Below, I’ve run my pen marking removal code on the same patch I’ve been working on for the past few weeks, and here’s my resulting patch:
Now, here’s what happened when I tried to replace the region corresponding to this patch in the original image with the processed patch you’re seeing above:
Not exactly the result I was expecting. As I investigated this issue further, it became clear that I needed an alternate method of mass-processing image patches. After doing some more research, I discovered patchify, a library whose apparent sole function is to extract patches from images and recombine patches into images. After switching gears to determine how exactly patchify works, I was able to write code to extract patches, perform pen marking removal on all of them, and put them back together, as seen below:
Hopeful that this issue was resolved, I downloaded the resulting TIFF output image, only to be faced with the slowest loading circle I’ve ever seen. I waited, and waited, and waited some more, watching the circle move at an excruciatingly slow pace. After about an hour, the file finally downloaded, and I clicked on it with eager anticipation. That anticipation disappeared when I saw this:
Convinced that this was some sort of glitch with the default viewing software, I loaded this image back into my notebook and extracted a patch. But alas, the result didn’t change:
It’s worth mentioning that technically, everything that I’m doing from this point onwards is just extra additions to the core of my project (which I finished last week), to produce a more complete (and dare I say, bejeweled) final package to display my results. As such, I’m not super worried about these issues I’m facing; if I’m able to figure out an alternative solution next week, then that’s awesome, but if not, I’ll still be able to present my existing results on their own.
Next week, I’ll mainly be focusing on finishing up this pipeline while resolving any existing issues with my code and documenting my project’s progress and results in the form of a final presentation draft for this project. Honestly, it’s pretty crazy to think that there’s only two weeks left in my project–it seems like just yesterday that I was anxiously preparing for my project defense. But what’s even crazier is that if you’re reading this, I assume that you actually took the time to read through the entirety of this post, and for that, thank you for indulging me :))
Sadly, it’s time to go, but until next week!
Citations
- Comic Used Under The Creative Commons CC-BY-NC-ND License From Https://Hildabastian.Net.
Leave a Reply
You must be logged in to post a comment.