Week 2: Data Processing

March 8, 2024

Hello everyone! Welcome to my second week of blog posts. This week’s goal for me was to remove any outliers, correct any errors, and standardize the scale of the data for parameters like temperature and ORP to facilitate comparison. In addition, I will apply transformations if necessary to normalize the distribution of values.

At the beginning of the week, I continued working on transferring the data and finally fixed the problem from last week. I was able to remove any inconsistencies and outliers from the datasets. During data processing, I noticed a significant amount of missing or outlier data in the total dissolved solids columns. So, I simply deleted those entries and replaced them with zero. I also created a Jupyter Notebook file showing all the parameters such as latitude, longitude, temperature, pH, and total dissolved solids (TDS). Additionally, I used a given ID name for sorting every data entry. I was able to find all the parameters for Lake Murray in San Diego. However, I had trouble finding data for dissolved solids and pH values in Lake Elizabeth. Therefore, I decided to continue analyzing my data, focusing only on Lake Murray.

As I met with my external advisor, Dr. Jay, he recommended that I use Flask during data analysis. So, I am planning to include that in week 6 when I start analyzing the data. Next week, I will start visualizing trends and patterns of different parameters as well as drawing some basic analysis from a linear regression summary report.

View more of Yao L.'s posts.

Week 2: Data Processing

Reader Interactions

Leave a Reply Cancel reply