
Kevin Z. 2025 | BASIS Independent Fremont
- Project Title: Visualizing Word Embeddings Using Dimensionality Reduction Techniques
- BASIS Independent Advisor: Ms. Apra
You’d think that ChatGPT, the thing that writes all your English essays, would be fluent in English. But it’s not. Transformer networks (GPT stands for Generative Pretrained Transformer) are entirely mathematical models, incapable of processing literal strings of text as input; they need to be converted into a language it can understand: math. Word embeddings, mathematical representations of words in vector space, were created to solve this problem, culminating in an “embedding matrix” that acts as the Transformer’s “dictionary” for English words. They’ve been known to capture some semantic meaning and relationships among words. For example, king - man + woman = queen demonstrates the concept of gender being encapsulated across different words. However, word embeddings are high-dimensional vectors (lists of around 200-300, to even 800 numbers), which makes interpretations and visualization of them, as we are limited to 2d and 3d representations of them, incredibly difficult. This problem is often referred to as “the curse of dimensionality,” a term coined by Richard Bellman. As a result, interpretation of the embeddings themselves are difficult, and analogies like the king queen example Aare often not always consistent. I hope to explore various “dimensionality reduction” techniques (reducing the list of numbers down to 2 or 3 while preserving as much information as possible), to gain insight into the most effective methods for maintaining semantic relationships in lower-dimensional space and review What each technique can teach us about word embeddings, hoping to contribute to the intuitiveness of AI.