Week 9: Tackling Inaccuracies in Water Usage Data: A Journey from K-Means to Gaussian Mixture Models
April 25, 2024
When analyzing water usage data for appliance detection, I found that the K-Means clustering algorithm didn’t quite meet my needs. The rigid boundaries of K-Means made it difficult to classify data points that could belong to multiple categories. This isn’t reflective of real-world scenarios, where things are often more fluid. Thus, I started exploring Gaussian Mixture Models (GMM) as a potentially better solution. While the benefits of GMM aren’t fully realized yet, my first step has been to adjust the data to better suit this new approach.
One key issue I faced was the inability to distinguish different appliances based on raw meter readings. Without clear patterns, I couldn’t expect accurate model outcomes. The variation in reading frequency made it challenging to separate data into distinct clusters. For instance, some data points had only a few readings while others had over a thousand. It’s hard to determine if both should be considered as “sink” appliances. To make sense of this, I shifted focus to flow rates, a more consistent measure. The flow rate per minute, along with the average flow rate, provides a more reliable basis for clustering.
During this process, I discovered that my earlier meter readings were off due to a missing decimal point. The readings lacked the decimal separator, so I simply divided all values by 10 to correct them. This mistake became evident when I calculated flow rates and noticed absurdly high numbers—something you’d never expect from a typical household appliance.
A funny incident happened while analyzing my data this week. We had recently implemented a new sprinkler plan, but I noticed a significant spike in water meter readings every 10 seconds in the middle of the night. Since nobody in my household was using water at that time, it pointed to a potential problem. It turned out that our sprinkler system was broken, leading to continuous water flow. After shutting it off, the anomaly disappeared. So, even if my model doesn’t achieve perfection, at least this project helped us identify a major issue with our sprinkler system.
Moving forward, my plan is to refine the data and further explore GMM to see if it can provide better clustering results. While the journey has its challenges, it has already proven useful in unexpected ways.
Leave a Reply
You must be logged in to post a comment.