Week 1: Laying the Foundation - Examining CrisisNLP
March 1, 2025
Welcome back!
This week I have been primarily going through the resources in the CrisisNLP dataset which I will use. CrisisNLP is a project that provides access to annotated social media datasets from natural disasters. Several other research projects have also used CrisisNLP, a notable one is “Automatic Identification of Eyewitness Messages on Twitter During Disasters” by Kiran Zahra, Muhammad Imran, and Frank Ostermann. This project focused on categorizing tweets into categories like: direct eyewitnesses, indirect eyewitnesses, and vulnerable eyewitnesses and they uploaded their annotated tweets onto CrisisNLP. Their project garnered 178 citations, meaning other researchers found it useful!
Now that I’ve verified the validity of CrisisNLP, how does my project rely on it?
I am creating my benchmark dataset from real messages people tweeted during disasters. This past week, I have reviewed and chosen potential questions from Resource 2 in CrisisNLP: This resource consists of human-labeled tweets collected during the 2012 Hurricane Sandy and the 2011 Joplin tornado.
Here is an example tweet I have chosen: “Where do people find these cans of food anyway? #mittstormtips #sandy” cell.” Of course, I will reword the question to make it more clear, but the main premise of the question is clear.
I have also started going through Resource 6: “This resource comprised of tweet-ids and a sample of raw tweets (50k) collected during three devastating hurricanes in 2017 namely Hurricane Harvey, Hurricane Irma, and Hurricane Maria,” however I still have a large chunk of it remaining due to its sheer size (50k tweets).
My next step is to review Resource 6 thoroughly and finalize my benchmark. I look forward to sharing my findings with you all next time!
Leave a Reply
You must be logged in to post a comment.