Week 10: Data Collection

May 9, 2025

Hi everyone, welcome back to my blog! In the last post, the use of BLAST within my algorithm development was discussed. In this blog post, I will cover the different pieces of data that I acquired from my project.

The first was pairwise alignment using the PairwiseAligner(). Essentially, the highest score for each alignment was measured, and then it was ordered with the highest score at the beginning and the lowest score towards the end. I was able to replicate it for all three strains that are being analyzed: E. coli K-12 MG1655, E. coli O157:H7, and E. coli O119:H4. The second was analyzing the BLAST algorithm’s outputted alignment lengths. Both a query-alignment-length plot, which shows how much of each query aligns to targets, and a plot with the total sequence length aligned for each locus tag were generated for each strain being analyzed.

The last round of data collection was for identifying types of mutations. There are two specific mutations that this algorithm will target: frameshift mutations and point mutations. Frameshift mutations include insertions, which add nucleotides from gene sequences, and deletions, which subtract nucleotides from gene sequences. Point mutations include silent, which do not impact protein translation; missense, which do impact protein translation; and nonsense, which initiate premature stop codons. This data was visualized in bar graphs that show the categorization of these proteins within each E. coli strain.

These graphs were included during my final project report and will be further analyzed within this research paper. This paper will be available during my Senior Project Presentation later this month. Until next time!

View more of Sonya S.'s posts.

Week 10: Data Collection

Reader Interactions

Leave a Reply Cancel reply