Week 2: Computer Algorithms for Bioinformatics and Genetics
February 28, 2025
Hi everyone, welcome back to my blog! This week, I will be focusing on the basics behind the computer algorithms created for the bioinformatics and genetics field while further expanding on the potential algorithms that I will use for my senior project.
Traditional Algorithm or Machine Learning?
The buzz words, “machine learning” and “artificial intelligence,” have integrated themselves into our every conversation of computer algorithms, but is it necessary to analyze gene sequencing? Gene sequences are currently stored in the NIH database in a text-based format with unique information such as accession number, locus, definition, source, organism, a features table, and the actual nucleotide sequence. While this data is being properly cleaned, then the type of algorithm has to be determined.
In the past, multiple different types of algorithms were used to perform comparisons between gene sequences. The first algorithm ever created for this purpose was the Needleman-Wunsch algorithm in 1970, which was used to carry out a global sequence alignment of two nucleotide sequences. It systematically compares every possible alignment by filling a scoring matrix based on substitution scores and gap penalties, ensuring the highest-scoring alignment is found. The algorithm follows three main steps: matrix initialization, scoring, and traceback to determine the optimal alignment. It is particularly useful for comparing sequences of similar length, such as homologous genes from different species, where global similarity is essential. However, the time complexity of the algorithm made it computationally expensive, so other algorithms were analyzed.
The Basic Local Alignment Search Tool, or BLAST, was created in 1990 to better the speed of these algorithms by approximating rather than computing any optimal alignments. It is a more widely used algorithm for comparing nucleotide sequences due to its computational efficiency. Lastly, algorithms were also created to compare multiple sequences (3+) at once including Clustal Omega and MUSCLE.
Conclusion
What can be understood from diving deeper into previously created algorithms is that the concept of comparing gene sequences has been successfully attempted before. However, what seems to be shared is that the concept of machine learning is not as heavily used within this sector of bioinformatics as researchers tended to rely more on deterministic, rule-based approaches instead. Therefore, for my algorithm, I have also concluded that machine learning does not hold as great of an impact or use within the comparison of sequences, so I will choose to create a traditional computer algorithm instead.
Reader Interactions
Comments
Leave a Reply
You must be logged in to post a comment.
It’s really interesting how you mention the pre-existing algorithms. I am wondering whether your algorithm will be optimized for speed (like BLAST), accuracy (like Needleman-Wunsch), or a balance of both? Essentially, how will your senior project algorithm be an improvement compared to the pre-existing algorithms?