Kevin Z. 2025 | BASIS Independent Fremont

Project Title: Visualizing Word Embeddings Using Dimensionality Reduction Techniques
BASIS Independent Advisor: Ms. Apra

You’d think that ChatGPT, the thing that writes all your English essays, would be fluent in English. But it’s not. Transformer networks (GPT stands for Generative Pretrained Transformer) are entirely mathematical models, incapable of processing literal strings of text as input; they need to be converted into a language it can understand: math. Word embeddings, mathematical representations of words in vector space, were created to solve this problem, culminating in an “embedding matrix” that acts as the Transformer’s “dictionary” for English words. They’ve been known to capture some semantic meaning and relationships among words. For example, king - man + woman = queen demonstrates the concept of gender being encapsulated across different words. However, word embeddings are high-dimensional vectors (lists of around 200-300, to even 800 numbers), which makes interpretations and visualization of them, as we are limited to 2d and 3d representations of them, incredibly difficult. This problem is often referred to as “the curse of dimensionality,” a term coined by Richard Bellman. As a result, interpretation of the embeddings themselves are difficult, and analogies like the king queen example Aare often not always consistent. I hope to explore various “dimensionality reduction” techniques (reducing the list of numbers down to 2 or 3 while preserving as much information as possible), to gain insight into the most effective methods for maintaining semantic relationships in lower-dimensional space and review What each technique can teach us about word embeddings, hoping to contribute to the intuitiveness of AI.

My Posts

Week 10 - Wrapping up

May 11, 2025

This week marks the official wrap-up of my senior project. My senior research paper titled “Visualizing Word Embeddings Using Dimensionality” is finally completed. What began as a vague curiosity about how AI “understands” words transformed into this deep-dive into high-dimensional vector space and dimensionality reduction techniques. During this last stretch, I put my foot on […]

Week 9 - LaTeX is terrible

May 4, 2025

Hey everybody! Nothing interesting this week–just more academic paper writing. I will say though, this week I felt much more confident writing compared to last week. Once you get started, everything else flows from there. I found myself reflecting back to the summer of my junior year, back when I was learning about sequence models […]

Week 8 - More Writing!

April 25, 2025

This week was frankly, really boring. Transcribing jargon and the research I’ve been doing for the past few months into something that’s both rigorous but understandable is quite the task. This week was a full dive into academic writing; I’ve had enough of staring at overleaf. At first, it was a bit intimidating. I was […]

Week 7 - writing

April 24, 2025

This past week, I shifted my focus from visualizations to the final product of my research: the literature review. I began organizing my paper into three main sections: Overview of Word Embeddings – covering models like Word2Vec and NumberBatch, and how they encode semantic information. Understanding the intuition behind how word embeddings are formed in […]

Page 1 of 3