Week 4: Refining My Scoring System and Unexpected Problems
March 27, 2024
Welcome to my fourth weekly update of my project. This week was mainly filled with finishing up the refinements to the code I made to the last scoring system last week as well as addressing a problem I found in my dataset this week.
First in fixing the scoring system, based on previous research into the morality of presidential speech, I found that care and fairness would both be pillars of morality that would be important to look into. This stemmed from my reading from last week and how multiple debates showcase the eventually successful presidential candidate using words from care and fairness pillars of morality. To account for this, I had to refine my scoring system to weigh these pillars of care and fairness more.
Using the scoring section of the emfdScore (Extended Moral Foundations) github repository, I am able to change weights of the overall morality score as well as break down individual pillars as I see fit. After my initial tests of each pillar, I will be able to tell whether or not these changes in weights will be required later on or not.
I also found a problem in my data set that persisted for a majority of the week. I found that some of the divisions within text for each presidential candidate were divided unevenly.
The issue that arose in my initial testing was that each break between debaters had words bleeding into the other debaters text block. This was due to an issue in my scraping algorithm reading the “end index” or last word of a debater’s specific text box as a later index number. This issue persisted for a majority of the week but by Saturday I was able to divide the text boxes evenly after finding that the end index of each text box was incrementally increasing causing the divisions to be slightly off.
After some final cleaning of the data set including removing stop words which are words that are insignificant in natural language (“is”, “they”, “are”) using the open source library known as spaCy, I finally cleaned my dataset enough to run using the emfd dictionary.
Leave a Reply
You must be logged in to post a comment.