Blog 5: LDA
May 17, 2025
Welcome back everyone! Last time we talked about the two distinct spectral groups I’ve found within Protium subserratum (Lobo and Blanco) and what that could mean for the possible success of my project’s greatest goal: a species-identifying algorithm. Lobo and Blanco prove that at least in some cases of protium (Burseraceae) species, spectra alone is enough to discriminate at the species level. Additionally, since Lobo and Blanco are found within one species, a cryptic species, their existence also proves that spectra alone is enough to discriminate between potential species within at least some species complexes of protium (Burseraceae). This exists as a proof of concept for my goal for this project and thus begs the question of how this fact can be taken and expanded upon to build a generalized algorithm. The simple answer is LDA (Linear Discriminant Analysis).
LDA is a fairly simple algorithm that becomes increasingly complex as more degrees of freedom are introduced. At its core, LDA is designed for the purpose of maximizing separation between two or more known groups, seemingly a perfect algorithm for my goal. It works by trying to condense a number of factors (variables) into a single discriminant score, which can then be used to discriminate between different classes.
The discriminant score is calculated according to the following formula:
LD = a1 X1+a2 X2+…anXn
Where “a” represents a calculated coefficient according to the LDA algorithm, and “X” represents the value of a variable. The function of the coefficient in this formula is to “weight” the importance of every variable and thus produce a discriminant score that has no real-world value, but is objectively the most polarizing value for that particular data point in the context of the rest of the data fed into the LDA algorithm.
The way that these coefficients are calculated is a bit too complicated to explain in this brief overview, but I will explore them further in future blog posts where I discuss the issues with implementing LDA into my project. What is important to remember for this post is that LDA uses these coefficients to essentially rank all the variables in order of separability, and that the discriminant score is the resulting linear combination of all the variables with their “weighting”.
So far, the basic methodology that my project has been using for this type of analysis is a direct implementation of LDA, where each wavelength is its own variable and the reflectance reading at that wavelength is the value of that variable. By generating a table of all these variables according to this convention and then separating them by class, which in this case is their species, I can calculate my weighted coefficients by running the table through an LDA function. With these coefficients, I can then generate discriminant scores for each one of my specimens and plot them accordingly. In theory, this would allow me to discriminate between any number of protium (Burseraceae) species, provided that I have enough measurements and they have statistically different reflectance measurements.
In my next post, I will cover why, with a seemingly effective methodology like this, a consistent discrimination algorithm still hasn’t been achieved in my project. This is mostly to do with the sheer number of variables that I would have to account for and the general noise that comes from taking spectral scans, but let’s save that conversation for next time. I’ll see you then!
Leave a Reply
You must be logged in to post a comment.