Week 4: Begin Again (And Again, And Again, And Again…)
Begin Again (And Again, And Again, And Again…)
Here follows a story of an anti-hero: one who continually, persistently, and valiantly jump(s) then fall(s), one who fights tooth and nail (and tips from Stack Overflow) against death by a thousand (software-induced) cuts, but also one who is a better man because of it. In other, decidedly less flowery words, I spent a lot of time trying a bunch of things that all ended up not working as intended. But before I get sidetracked by my impeccably constructed (sorry Mr. Brady, but they really are) puns that are my pride and joy, let’s embark on my tale of woe.
The Short-Lived Hope of HistomicsUI
Last week, I ran stain deconvolution on glioblastoma slide patches, only to discover that my resulting images were a bit blurry and contained artifacts on the deconvolved images. As I described in my last blog post, I discussed this issue with my external advisor and we agreed that I should try running stain deconvolution with HistomicsUI, an alleged alternative to HistomicsTK (the main library I’d been using so far).
I say alleged because upon exploring HistomicsUI’s GitHub repository, I saw this:
This basically means that the stain deconvolution algorithms available in HistomicsUI are the exact same algorithms that HistomicsTK has, which is what I’d already been using this whole time. I knew all too well (10 minute version >>>>) that switching libraries to HistomicsUI and trying to see whether I could get better results was pointless because I knew I’d just get the exact same results. And so, onto my next option.
Comparison is the Thief of Joy
Apparently, Teddy Roosevelt was the first to say this quote, and I gotta say–he was so right. I could truly feel the joy being sucked out of me as I tried relentlessly to work on my next task: comparing deconvolved patches with normal tissue to patches exhibiting pen markings.
This seemed relatively simple at first: I thought I’d just have to take one slide with pen markings and then run stain deconvolution on a normal patch and patch with pen markings before comparing the two. But regrettably, a straightforward journey was not in the cards for me; rather, it seemed to me as though it was enchanted with curse after curse.
First, I needed to find all slides in the glioblastoma dataset that contained pen markings. This was pretty simple for me because when I was working with this dataset over the summer, I had already gone through all of the slides and manually marked which ones contained pen markings because those were the slides that I would need to exclude during analysis (highlighting the relevance of this project). Here’s a sample of the spreadsheet in which I recorded these observations:
From this spreadsheet, I selected a random slide that I had noted contained pen markings. When I viewed it in the TCGA database, it turned out to be approximately 215 MB, which I innocently thought wouldn’t be an issue since I was analyzing these slides on the patch level anyways. I was wrong.
When I tried converting the SVS slide I had chosen into the TIFF format using the pyvips library (like I had been doing throughout this project), I was met with this error:
Upon doing some research into this issue, I found that if I set tile = True (indicating that the image will be split into numerous individual tiles and saved) during the conversion process, then I could bypass this issue and keep going with my project. And so I did:
I thought this was just a minor setback that I had been able to overcome, but just like if this was a movie, little did I know that my troubles were just beginning.
After running patch extraction on the slide and viewing the patch coordinate set (refer to my previous blog post for an explanation of what exactly this is), my goal was now to find one patch displaying normal tissue and the other showing tissue covered in pen markings. Finding the former was a fairly straightforward task: I just took the first random set of coordinates from the set, cropped the image, and got this as a result:
The latter, however, was an entirely different story altogether. The problem was that there were so few regions with tissue covered in pen markings on the slide that it made finding coordinates corresponding to these regions very difficult. For reference, here’s the full slide I was initially using:
Thus began my long journey to stumble upon the coordinates I needed to continue with my project. I selected a random set of coordinates, cropped the image accordingly, viewed the cropped patch, and was forced to begin again over and over again upon seeing that the patch did not contain both pen and tissue. I felt like I was the archer shooting arrows at my prey, only to miss time and time again. Hours dragged on, until I finally happened upon this patch:
That got me hopeful: although this patch was entirely composed of pen markings, if I could alter the coordinates a bit and navigate to where the pen intersected with tissue, I could finally have the coordinates of a patch exhibiting that overlap. After some more trial and (all-too-frequent) error, my eyes flew open as I saw the image I’d been waiting for for so long:
Success! I was finally done (or so I thought). I excitedly saved the coordinates of my two patches and ran stain deconvolution on both. The results, however, were not exactly what I expected:
As you can probably tell (especially so from the eosin channel visualization), there were weird tiling-like artifacts on both images that resulted in distortion and blurriness. Viewing these results, my mind immediately shot back to the issue I had with pyvips and how I set tile = True when processing the image to avoid issues with the image exceeding the maximum size requirements. With my heart sinking in my chest, that was the moment I knew that my attempts to circumvent that issue had likely resulted in the less-than-ideal results before my eyes, given that the artifacts looked a whole lot like tiles on the image.
Realizing I would have to start the entire process over again with another, smaller slide, I sighed and began organizing all of the slides with pen markings in the dataset by image size. After a bit of searching, I found another slide with pen markings that was only ~42 MB:
Granted, this was by no means the best slide to work with given how little pen markings overlapped with the tissue on the slide, but its relatively small size made it extremely appealing. After I ran patch extraction and scanned over my new coordinate array, I realized that my comparison task would be significantly more difficult with this slide. While it would again be fairly easy to find a patch with only tissue, trying to figure out a coordinate set that corresponded to a patch in that small region at the top where the pen markings overlapped with the tissue would be like trying to find a needle in a haystack. I was not ready for it.
But I had no other choice, and so I (channeling the persona of Mr. Perfectly Fine) began the arduous journey of picking a random set of coordinates, comparing the resulting patch region to the overall slide, extrapolating where on the slide the coordinates corresponded to, and using that information to guesstimate a new set of coordinates that would (hopefully) be closer to my region of interest.
After lots of squinting at the original slide to figure out where my patch was, crossing my fingers as I selected new coordinates, and endless cursing at the pathology image gods, I found it:
A rush of euphoria coursed through me as I quickly ran stain deconvolution on my selected patches once again and waited with anticipation as my results were generated–until I saw this:
I cradled my face in my hands in despair as I told myself, “breathe,” over and over again as I processed the fact that all that time and effort spent running everything again with tile = False with the smaller slide had produced…the exact same results. The artifacts were once again visible in both images.
As a final nail in the coffin, when I tried viewing the generated TIFF file with QuPath (the application I used in my first blog post to perform sample stain deconvolution) to see whether these artifacts were intrinsic to the TIFF image, I was greeted with this lovely message:
My bad blood with memory issues had returned with a vengeance.
After I shared these disappointing results with my external advisor during my weekly meeting with her and discussed next steps, I plan to do the following in the next week:
- Extract a 256 x 256 pixel patch from a normal slide and blow it up to the size I’m looking at the deconvolved images in to see whether there are any artifacts before stain deconvolution occurs. I’ll do this for both the raw SVS slide as well as the converted TIFF slide to determine the point at which these artifacts are appearing in my deconvolution pipeline.
- Perform stain deconvolution using a stain separation algorithm available in the scikit-image library. If successful, this could potentially act as an alternative to HistomicsTK and may resolve the issues I’m currently facing.
- If I’m able to determine and remove the source of the creation of artifacts on the patches I’m using, I’ll run my pipeline again and hopefully obtain results that I can use to compare the effects of stain deconvolution on pen- and non-pen slides. At this point, I should be able to look into methods to combine the separated stains and determine how effective stain deconvolution techniques are at removing pen markings.
But alas, I’ve reached the conclusion of my narration of the sad beautiful tragic journey that was my work on this project over this past week. Although the getaway car I had jumped on with last week’s promising results seemed to run out of gasoline this week, I am determined to persist and succeed in the end game. Because after all, success after seemingly endless failure truly hits different.
See you next Friday for your weekly dose of more spectacular, amazing, jaw-dropping T.S. puns :))
- “HistomicsUI.” Digital Slide Archive, GitHub, Https://Github.Com/DigitalSlideArchive/HistomicsUI.
- Jemsy Comics (?); No Attribution Found