
Rajat R. 2025 | BASIS Independent Fremont
- Project Title: Evaluating the Capabilities of Large Language Models to Provide Directions in English and Spanish During Disasters Through Creating a Novel Benchmark
- BASIS Independent Advisor: Ms. Rangoli
Climate change is increasing disaster frequencies and intensities. Much of the US population speaks Spanish at home; disaster management systems should factor in non-English speakers. My project encompasses the creation of a novel benchmark: Hurricane-Tornado-Bilingual-Assistance-Dataset designed to evaluate Large Language Model (LLM) capabilities to answer disaster-related questions in English and Spanish. I am specifically focusing on extreme wind events and plan to complete my project virtually, under the guidance of an Associate Engineer at MIT Lincoln Laboratory's Humanitarian Assistance and Disaster Relief Systems department. To create my benchmark, I will manually pick out relevant questions from tweets sent during past extreme wind events from CrisisNLP. I will then use DeepL, an accurate AI translator, to translate the questions from English to Spanish (I will verify these translations with a native Spanish speaker). Finally, I will evaluate GPT-4-mini and Llama 3.1 on the benchmark where I predict that both models will more likely hallucinate and provide inaccurate answers for the Spanish questions. However if the models perform really well, they can then be used to power a chat-bot which FEMA can deploy during different disasters.