Week 7: Completing Live Transcription
April 17, 2026
Hello! This week, I generated full multilingual transcripts after live-playing the audio and worked on improving the script I built the last few weeks. I also continued my reading on language segmentation and contrast for future comparison between Phase 1 and Phase 2 results.
To transcribe one continuous text, I improved the structure for multilingual recognition using Google speech-to-text. Repurposing the programs I wrote in Phase 1, I analyzed word-by-word dialogue while differentiating between Telugu and English. The terminal tracks output per 30-second segment, with raw speech-to-text, then the translated segments. My initial results were what I expected of testing “live” translation: there was enough lag that not all the sentences were captured. What I did not expect was that the same dialogue repeated several times, almost as if the program was hallucinating.
I synced the speech-to-text outputs with a “live-session” spreadsheet that would label each interview being played and record its respective output. I continued with the full live transcriptions by playing the five interviews in order while analyzing program performance. The final scripts I developed throughout the week were organizational, mimicking the same format I stored generated scripts from Phase 1. This way, I can directly run my statistical comparison of protocol performance across file and live translation as I intended to at the start of the project.
The rest of the week, I returned to the other methods I tried for live translation during Week 6. I researched the backend for MacOS live captions and MacWhisperer for ways to improve my final live-translation script further. One observation I made was the setup both methods rely on: the Apple Neural Engine. The hardware accelerator is designed to advance AI tasks with greater energy efficiency. More importantly, it makes tasks on local MacOS systems that require machine learning or adaptability a lot easier. Although my scripts tune to the correct sound settings and frequency, they will never match the performance of these other options since I’m running everything on a local system.
For the final results comparison, I am creating a comprehensive collection of transcriptions from all three sources so I can compare progress on multiple fronts. In the next few weeks, I will create final results tables and complete my paper detailing the general progress I made towards this project. Until then, see you soon!
Reader Interactions
Comments
Leave a Reply
You must be logged in to post a comment.

Hey Raghav! It sounds like you had a massive week wrapping up the live transcription phase. I totally relate to the frustration of building a custom script only to realize that big tech companies have a baked-in hardware advantage that is almost impossible to beat on a local machine. Still, engineering your own pipeline from scratch teaches you so much more about how the data actually moves. Good luck with the final paper, looking forward to seeing the final data!