Filling In Lines – Part I
Note: It became clear as I was writing this post that it had grown very quickly to too a great length. I decided, given I was already summarizing rather heavily and needed to find my way to the present expeditiously, that the post should be broken in half. This first part is a short introduction to the description of my initial bioinformatics work which was done over the course of roughly three weeks.
This is, I hope, my last post written in past tense.
Before I could get into engineering the desired product of my senior project – what I’m doing right now – I had to turn the general concept of what my assay would be – what it searches for and ultimately observes – into a test which is functionally capable of making observation. If that’s not fully clear consider a smoke alarm, effectively (not really) an assay for fires. One can understand that to detect a fire it’s necessary or at least useful to be able to detect smoke. How one reliably and (as my project demanded in particular) cheaply detects smoke is another issue entirely. I was looking for a way to smell the smoke.
If you’ll recall, my project deals with regional variations in a species of malaria called P. Vivax. These geographically correlated, critically evolutionary selected differences could theoretically occur anywhere on the shared genome of P. Vivax, but thanks to my mentor’s extensive research on the infection and reproduction of P. Vivax in mosquitoes (Malaria’s primary vector), we already had a strong concept of where to look: a single, especially high-leverage gene called PV47. The importance of this identification can’t be overstated. This is, as I have alluded, partially a function of the enormity of what I would have to search for, the size of some n-tuple. But even more than this, what I would look for would have effectively no discernible biological or even ultimately physical meaning. Because the high-leverage gene is already identified, I knew and know what the evolutionary source and necessity of these mutations are.
Anyways, what does it mean to have started with a gene? What I had was a list of a little more than a thousand nucleotides of what’s called a reference sequence (at least at the time of my starting my project) which codes in turn for a chain of amino acids, the building blocks of the proteins which in turn perform an outsized proportion of the tasks necessary for an organism to perform.
Employing this reference, I needed to figure out a few things. First and most easily ascertained: what similar sequences are out there? Second, what else can PV47 code for and where do you find these variations? And finally, how strongly can we group strains of malaria geographically according to these observed variations? Each of these tasks proved far more complicated than I could have initially envisioned.