Week 4: Preliminary Generalization and a Problem
March 25, 2026
Welcome to Week 4! In-case this is your first time tuning into my senior project blog posts, I’ll quickly summarize what has happened the last few weeks. I’ve finished my ML online-course as well as compiled a series of texts to explain my project in context in the form of a literature review. Within this literature review, I’ve also outlined the steps that need to be taken to stay on course in order to meet the final project deadline. Currently, I am in the “dataset searching/requesting” mode, and have figured out a rough idea of what datasets I will use for the first week of preliminary training. These are publicly available datasets that do not require any requests for access, hence the reason why I chose them for the first “round” of training. They are as follows: NACC, OpenNeuro, Neu3Grid, OSF, and Centaur. I will subject these datasets to some conversion methods briefly, in order to gain experience on what it would look like on a larger scale with more variability and complexity of datasets in the near future.
But why must we do this? Well, the formats that we see on websites like Kaggle(JPEG/PNG) are vastly different from the ones we see from medical institutions(DICOM/NIfTL), however most have one thing in common. They are usually registered to the MNI152 standard brain(brain imaging template), or usually are able to be easily converted to this, allowing us to remove head size/shape differences and enable voxel-wise comparisons, needed for CNN generalization. However, there is still a difference present, which is in the file itself. This is where the converters I mentioned earlier come into play(dcm2niix and NiBabel). Dcm2niix allows me to convert any raw DICOM files into the desired NIfTL files, which will then be processed into the CNN using NiBabel.
In the process of requesting, I did run into a few problems. The biggest was the ADNI data-use agreement, specifically not being able to use AI in projects like mine. I had to stop preliminary testing/conversion processes to address this, double checking to make sure all of the publicly available datasets allowed me to use AI with them. The biggest concern with AI when it comes to Tau-Pet scans would be the risk of data-sharing, as it is still sensitive medical information. In order to get around this hump, I did various research on similar projects to mine(ALD detection using tau-Pet ML), and found that they did use datasets like ADNI despite the user agreement. With this in mind, my advisor sent an email to ADNI explaining our situation and asking for their outlook and decision on our case, and I sent an email to Google CoLab confirming that their website did not involve data-sharing whatsoever. We are still waiting to hear back, and hopefully when we do, we are able to go forward with this specific dataset. Worst-case scenario, I can still analyze the data independent of using ML, and then input those findings into the model without compromising any specific data that is sensitive. Going forward, I will continue to run some preliminary generalizations for the datasets + set up the ML Model in CoLab, as well as continuing to contact medical institutions for more datasets. Until next time!
Reader Interactions
Comments
Leave a Reply
You must be logged in to post a comment.

Hi Aditya. This was a great blog post in terms of explaining the details of why you’re doing all the steps in your project. You’ve clearly laid out the data sets you are using, why you need to process them as you are, and some of the hurdles you’ve had to address during the process. You’ve really captured the research experience well with this blog post. Keep up the good work!