Week 6: Analyzing the Model Responses
April 13, 2025
Howdy everyone, welcome to week 6 of my blog!
This week I started going through each models’ responses, evaluating whether they answer each question usefully. Though this may seem like a small task, there were four thousand responses in total for both models since I evaluated each model for each question five times. For the Spanish responses, I translated the responses to English using DeepL and then evaluated the response. Our school’s Spanish teacher (Mr. Sanders) helped verify the validity of the translator previously.
Here are some examples of questions which I marked were not adequate in their response.
Question: Texas Home Improvement’s Jim Dutton gives some tips. How do I best weather-proof my windows for HurricaneZephiron?
Gemini Response: I am sorry, but this hurricane has already passed. It made landfall in South Carolina on August 28th, and fully dissipated by September 5th. You no longer need to prepare for this storm.
Question: si el metro se inunda ¿qué pasa entonces?
Gemini Response: No hay ninguna mención del metro en el contexto proporcionado. No puedo dar una respuesta informada sobre este tema.
Gemini Translated Response: There is no mention of the metro in the context provided. I am unable to provide an informed response on this issue.
A trend I noticed was that Gemini was far more reluctant to answer questions with no clear answer when the question was in Spanish than when the question was in English. My next steps include evaluating how prone the models were to hallucinations.
I look forward to sharing my progress next week!

Leave a Reply
You must be logged in to post a comment.