Week 10: Preliminary Findings — Social Media

May 11, 2025

Hello everyone!

In this post, I’ll be sharing some of my observations from my social media dataset.

Expected Results

All three platforms seem to struggle with incomplete and grammatically incorrect sentences the most. Accuracy as a whole is noticeably lower than the ALT dataset. The ALT dataset had few hallucinations, but with social media, they seem to be commonplace. Again, words with multiple meanings are often misinterpreted – a reference to Tulsi Gabbard running was translated literally by Bing (but translated correctly by Google, as there was a mention of voting later in the sentence).

Unexpected Results

Bing Translate’s “casual” tone often adds and removes emojis to posts. It doesn’t have much of an effect on the accuracy of the translation itself, but it definitely wasn’t something I was expecting.

All three platforms leave some words untranslated. This was common for Google with English names in the ALT dataset, and I didn’t deduct points for that, but here, some words that definitely have Burmese translations are being left untranslated, such as “fiancé” and “artist,” so I began deducting points in fluency. In one instance, Bing decided to transcribe “China” as “ချင်းနဲ့” (pronounced chin-neh and meaning “with ginger”) instead of translating it into the actual word for China, causing “CHINA HERE I COME!!!!” to be translated as “You didn’t come with the ginger?!!!!”

Bing had several strange text errors, the most egregious example being random Thai characters mixed in with Burmese words. Microsoft does have several research papers discussing the off-target problem with multilingual models, where multilingual models produce input from the wrong language (Chen et al., 2023), so perhaps I shouldn’t have been completely surprised, but it was still quite jarring to actually see.

In my next post, I’ll be reflecting on my project as a whole and discuss further research and possible improvements. (For final results, you’ll have to watch my presentation!)

Citations:

Chen, L., Ma, S., Zhang, D., Wei, F., & Chang, B. “On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation.” (2023). arXiv, https://arxiv.org/abs/2401.12413.

View more of Aindra T.'s posts.

Week 10: Preliminary Findings — Social Media

Reader Interactions

Leave a Reply Cancel reply