Week 7: Msitakes
April 18, 2024
Hello itnllegint bineg. Cna yuo raed thsi? Yse? Ins’t tath crzay? Hwo yoru brani porceseess tpyos liek itss nthinog? Maeks me wnoder…do we raelyl nede to splle crrorectyl? Waht iff ew acectped hour typso adn msitakes in eevrydya comunication, fluly ebmracd oru flawsw? Wuoldnt’ teh wrold bee os muhc mroe coolrful adn welcming?
Um. So. You know people make mistakes from time to time. No one’s perfect. Our flaws are what make us all interesting, unique, and, most importantly, human. This week, instead of making progress, I decided to be quirky and different — I decided to make mistakes instead.
Mistake 1: Telescope data takes up space — space as in bytes on a hard disk. Even though I am working on a remote computer with 15 external drives, each with multiple terabytes of space, I somehow still managed to run out. And you know what you naturally do when you run out of storage space? You clean some stuff out! But you know what you don’t want to do when you clean stuff out? Remove important files.
Well um. Ok technically, I didn’t remove any important files. I just deleted a few terabytes of telescope data that I thought I would never use. If I needed the data in the future, I would just download it again, no biggie. Little did I know, when I processed the data, I sorted the telescope exposures based on the time the files were downloaded to the computer — that timestamp, and hence the order of files, will change if I download the telescope data again.
Now this change in order wouldn’t be a problem if I simply saved a list containing the initial ordering. But you know I’m not smart or careful enough to do that!
Turns out, I have to use some of the data that I deleted. Lucky for me, it’s nowhere to be found! Now I have to download the 1.5 terabytes of data, process it all again, and then continue where I left off. Downloading the data takes about 2 days and processing it all again, about 3 days. THAT’S A WEEK GONE!! Anyways. I’m not mad about it, you are.
Mistake 2: Telescope data is complicated. Well, it’s not that bad, but today I’m venting all my frustrations. Essentially, before being able to actually process the image, you need to subtract the background variation, divide by the standard deviation of the background, and then clip off pixels that are either too bright or too dim (usually really bright stars or camera pixels that have gone haywire).
Anyways, I was looking through my image normalization function and discovered that I forgot to subtract off the background variation! I could significantly improve the accuracy of my algorithm if I simply added the code to subtract off the background variation. So I added it, then spent 2 days regenerating training data and evaluating the model. The accuracy dropped by 20%. And not just any 20%, from 99.8% to 80%. Absolutely tragic.
How could this be? Turns out I had already subtracted the image background, right before I called the normalization function in my code, so I was effectively subtracting the background twice. Why didn’t I put this subtraction code in my pre-defined normalization function? Because of the coding etiquette that I never learned.
I’ll end my blog here — before the salty tears, streaming down my face onto my keyboard, short-circuit and explode my computer.
Leave a Reply
You must be logged in to post a comment.