Week 5: Linearity
April 4, 2024
Hello everyone, welcome to my week 5 blog. We’re at the halfway point. This week’s goal was to create a Jupyter notebook to perform Linear Regression on our data, focusing on the Linearity Assumption.
To check linearity, we first needed to plot residual plots for each of our datasets. Residual plot can help understand whether the relationships between temperature and pH, temperature and TDS, and pH and TDS are in a linear relationship. If the residual plots show a random pattern, this tells that the linear model assumptions do not hold. Starting on Monday, I looked up how to plot a residual plot in Python using built-in. On Tuesday, I began using residual plots to analyze my datasets. The plots all displayed a downward trend for all three graphs, showing that as temperature decreases, pH also decreases, and following the same trend for other comparisons. Therefore, the graph shows our first assumption does hold in our Model of Goodness data analysis.
After discussing my progress with my external advisor, he suggested that I use the Durbin-Watson Statistical test as another method to test linearity. So on Wednesday, I began learning about the Durbin-Watson test and how to apply it to my datasets using statsmodels. From the website, I learned that the Durbin-Watson statistic ranges from 0 to 4. A value less than 2 indicates positive autocorrelation. This means that an error in one variable is likely to cause an error in another. A value greater than 2 indicates negative autocorrelation. This also means that an error in one variable is unlikely to affect another variable.
The results of the Durbin-Watson test were as follows:
Durbin-Watson Statistic for Temperature vs. pH: 0.054
Durbin-Watson Statistic for Temperature vs. TDS: 0.073
Durbin-Watson Statistic for pH vs. TDS: 0.066
These results are all significantly lower than 2. This tells us that all three graphs are positive autocorrelation in the residuals model. This draws the same conclusion with my residual plot analysis. This is because the Durbin-Watson test suggests that the variables are highly dependent on each other, which holds true for the linearity assumptions.
Next week, I will start checking the Mean of Residuals Assumptions!
Leave a Reply
You must be logged in to post a comment.