Week 7 - writing
April 24, 2025
This past week, I shifted my focus from visualizations to the final product of my research: the literature review.
I began organizing my paper into three main sections:
Overview of Word Embeddings – covering models like Word2Vec and NumberBatch, and how they encode semantic information. Understanding the intuition behind how word embeddings are formed in the first place is essential to a good understanding of this topic
Dimensionality Reduction Techniques – diving into each of the dimensionality reduction techniques and their mathematical foundations, gaining an understanding of them in the general context before looking at them in the word embedding context.
Evaluating Visualization Quality – looking at current methods for evaluating reduced-dimension projections, including metrics like cluster cohesion, neighborhood preservation, and interpretability. There’s also the subjective aspect of linguistic analysis that I have to consider (what is even considered to be in a semantic cluster)
While working on the paper, I revisited a key distinction between dimensionality reduction techniques that I would have to consider in my analysis: linearity. PCA and LLE are linear methods, which has its linear algebra definition, but would intuitively mean that global structures would be forced to be preserved under the linear constraint. T-SNE, Kernel PCA, and especially UMAP, are non-linear methods, meant to capture the more complex geometry of high dimensional space, capturing local semantic structure better than linear methods.
What’s been tricky—but interesting—is navigating conflicting opinions in the literature. Some researchers are skeptical of visualizations entirely, suggesting they can mislead more than they reveal. That tension has actually helped me sharpen the purpose of my project: not just making “pretty plots,” but understanding what they really say about the embeddings we rely on.
Next week, my plan is to finalize a full draft of the literature review and start systematically comparing how each reduction technique performs across a few controlled semantic tasks. I’ll probably include new visualizations that illustrate both strengths and weaknesses of each method.
Stay tuned—things are getting more rigorous and reflective, and I’m excited to see how this all comes together into a solid final paper!
Leave a Reply
You must be logged in to post a comment.