Week 8: Full Dataset Graphs
April 12, 2025
Welcome back! It’s incredible to say we are nearing the last few weeks of the senior project! While my updates are relatively short this week, I have interesting findings to share with you all regarding the influenza status shown through Reddit.
Last week, I ran my BERT model on my Reddit sample from the 2024-2025 flu season. Because the graph had a nice shape that likely represented a seasonal trend in the proportion of ILI posts, I wanted to see if the same could be shown for previous seasons. I graphed out the proportion of ILI posts detected from all two-week periods since October 2022:
While the graph is slightly blurry, the general trend in flu-related Reddit posts remains consistent throughout the 2022-2023 and 2023-2024 seasons, with peaks in cases around December to January. However, the error margins for the past two seasons are noticeably larger than the most recent season, likely because PRAW was able to access less posts from previous years, contributing to the uncertainty. Another notable observation is that the proportion of posts indicating ILI has increased over the last three years; while this is difficult to explain without reading into confounding factors such as lockdown returns or particular big panics caused by news coverage, we can infer that as the world returns to normalcy from the COVID-19 pandemic and take less respiratory precautions, the number of influenza cases also rises.
My next step is to compare the Reddit flu rates to official flu rates reported by the top 3 English-speaking nations from Reddit’s user demographics to assess whether the timing of concern from Reddit correlates with real-world infection rates. Because most of the graphs published online are line graphs, I also created a line graph version of my bar graph, with a slightly different representation of the 95% confidence interval (shown as the shaded light blue area):
I will also be normalizing the flu rates on Reddit to rates of concern demonstrated through Google Trends. By doing this, I aim to determine how representative of actual public interest/concern Reddit can be.
Although this week’s blog is short, I hope you found these findings just as interesting as I did! Be sure to look out for my next post next Friday!
Leave a Reply
You must be logged in to post a comment.