Week 5: Back To (February)
April 6, 2023
Back To (February)
Note: The cartoon above is published on this blog with written permission from the creator/copyright holder. Formal citation below.
When I started this project, I knew all too well that this project was going to go by faster than I could ever imagine, but it’s still crazy to think that I’m already halfway done. To mark this momentous occasion (and to give myself a little break from endlessly scrolling through the names of all of Taylor Swift’s songs), today’s post will be relatively light on the puns. While my tears ricochet as I sorrowfully deliver this news to you, I must push on with this week’s updates.
Finally Discovering the Source of the Annoying Slide Artifacts
Last week, I developed a foolproof plan to determine where the weird artifacts on the slide were coming from: a plan that I put in action this week.
Given that I was observing the artifacts on deconvolved 256 x 256-pixel patches, my first step was to go back to determine whether the artifacts were present on the converted TIFF slide image. To do so, I zoomed into the sample TIFF slide that I was using, and here’s what I found:
In this zoomed-in section of my TIFF slide, it’s pretty evident that the artifacts are indeed present on the slide at this point in my pipeline. So I decided to take another step back and analyze the raw SVS slide itself to see whether these artifacts may have been generated during the SVS –> TIFF conversion process or whether they were present from the very beginning.
Since SVS files are notoriously difficult to work with in their raw state, I decided to use the built-in TCGA slide image viewer to accomplish this step. When I did so, here’s what I saw:
It may be a bit harder to see in this example since the TCGA slide viewer didn’t allow me to zoom in quite as much as I was able to with the TIFF slide, but I can clearly see artifacts present on this slide as well. What this means is that ultimately, there’s not much I can do about manipulating these artifacts to avoid their presence since they’re present on the raw slide. Thus, while not the most appealing option, I’ll be dancing with (my) hands tied as I have no choice but to attempt to progress through this project with the artifacts present in all of my data.
An Alternative to HistomicsTK in Action
After I discussed with my external advisor last week about the issues with artifacts I was facing, she suggested that I try using built-in stain deconvolution algorithms in scikit-learn, a potentially viable alternative to HistomicsTK. While I found this week that this would not affect my results due to the innate nature of the artifacts on the slides, my curiosity got the better of me and I tried going this route anyways.
First, I loaded in my TIFF slide with OpenCV and modified an environment variable to override OpenCV’s max image size limit to ensure that I’d be able to load this slide without any issues. Processing the slide with OpenCV automatically converts it to a numpy array, which is exactly the format I need the slide to be in for stain deconvolution with scikit-learn.
Then, I implemented stain deconvolution with scikit-learn:
After loading my slide, I was incredibly curious (and hopeful) to see whether I could run slide-level stain deconvolution with scikit-learn without running into memory issues. Those hopes quickly vanished as soon as I saw this wonderful message when I ran stain deconvolution:
At this point, this issue is nothing new to me. I’ve learned to just shake it off and move on.
Anyhow, with that out of the way, I began working on my main task: patch-level stain deconvolution with scikit-learn. I extracted two patches, with one exhibiting normal tissue and the other exhibiting pen markings, from my glioblastoma slide. These are the same patches I displayed in last week’s blog, but here they are again (with the coordinates for each in the code) for your reference:
After extracting these patches, it was finally time for stain deconvolution. Without further ado, here are my results for the normal patch, followed by the pen marking patch:
There’s a lot to unpack here, but here are a couple of my main takeaways:
– Unlike HistomicsTK, these results are colored, which seems to have reduced the visibility of the artifacts on the patch. The stain deconvolution itself seems to have been performed successfully for both the normal and pen marking patch.
– It seems as though the blue tint on the pen marking slide altered the resulting colors on the images displaying the results of stain deconvolution. However, for the separated hematoxylin and eosin stains, the colors appear to be the same as those in the normal slide, which is a great sign.
– Note: this is still not a perfect solution, as is most evident with the stark presence of artifacts on the eosin channel of the pen marking slide. But hey, I’ll tolerate it.
Bonus: Back to (February)
As part of my Advanced Java Topics capstone class, I began working on a data pipeline of sorts for this senior project. More specifically, I leveraged the TCGA API to efficiently query and download specified diagnostic slides and datasets from the TCGA database. This week, I spent some time editing it to fit the current specifications of my project.
Here’s a sample of how the existing pipeline works:
In this example, I’m entering the UUID (a unique identifier assigned to each individual slide in the TCGA database) of the glioblastoma slide I’ve been working with for the past couple of weeks, and as you can see, my program automatically matches the UUID to a slide, sources it from the TCGA database, and downloads a local copy of it for further processing.
In the next few weeks, I will adapt and integrate this existing code into a fully-fledged pipeline that sources slides from TCGA, extracts patches from the slides, runs stain deconvolution on the patches, and combines the separated stains for each patch and slide for use in research applications.
Apart from this main goal, I have a couple of other tasks I plan to accomplish over the next week:
– Extract larger patches to hopefully decrease the impact of the presence of slide artifacts on stain deconvolution. Since the artifacts become progressively more visible and clear as I zoom further into the slide, I will try extracting patches larger than my current size (256 x 256 pixels, which I selected because it’s the standard patch size for deep learning) to see whether I could potentially obtain better results.
– Implement code to combine my separated stains and view the resulting image (that, if everything goes according to plan, should not have pen markings anymore).
You may be surprised at the utter lack of puns in today’s post. I bet you think about me (and my crazy personality and my lack of even crazier puns) all the time after reading this post and worry. But never fear: regardless of whether you’re elated or sorely disappointed by this, everything has (not) changed. Even after this reassurance, if you’re still screaming “I want you back” in your mind to me, I guess you’ll just have to wait till next week to figure out if I really am or if I’ve become a renegade for good.
Until then, have a great week :))