Week 8: Challenges in K-Means Clustering for Appliance Recognition
April 20, 2024
What initially appeared as strides forward in the analytical journey have revealed themselves as missteps, marked by subtle yet significant communication gaps. Central to this week’s objectives was a comprehensive analysis encompassing the creation of graphical representations and a meticulous examination of performance metrics. However, amidst this process, a critical issue came to light—the presence of discrepancies in timings within the dataset.
In an endeavor to delve deeper into the intricacies of my final kmeans model, I embarked on an exhaustive exploration of individual data points residing within each cluster. This exploration culminated in the development of an automated function aimed at discerning the underlying appliances associated with each data point. Subsequently, the task at hand pivoted towards the construction of a secondary model intended to provide insights into the interpretation of kmeans results. Yet, amidst this endeavor, a confounding observation emerged: each cluster exhibited high accuracies in representing “flush” events. This discovery prompted a period of profound perplexity and necessitated a thorough investigation.
A meticulous examination of the dataset ensued, with a keen eye for recurring patterns or potential distortions. However, despite the prevalence of “flush” events within the dataset, the varying accuracy percentages observed across clusters defied simple explanations. It was in revisiting the original dataset, which included timing data previously disregarded, that a breakthrough occurred. Upon scrutinizing the results of the revised kmeans model, a glaring truth emerged: the clusters were undeniably influenced by time.
Through the utilization of identical indexing methodologies, a comparative analysis between the time-inclusive kmeans clustering and its time-omitting counterpart was conducted. This analysis yielded a pivotal insight: while the temporal feature had been deliberately omitted in the latter, the clusters remained inherently tethered to chronological sequences. This revelation shed light on the persistent temporal dynamics within the dataset, even in the absence of explicit time-related features.
To summarize, three various datasets are now being tested. The original data containing only the water meter readings (range: 331 to 1018257), the original data + timing of the water meter reading, the change in the original data. Since last week revealed a negative towards the time inclusion we eliminated the second dataset and focused on enhancing the first dataset.
Looking ahead, the focus shifts towards refining a kmeans algorithm that capitalizes on the transitions in water meter readings, rather than relying solely on absolute values. It is through this nuanced approach that further insights into the underlying patterns embedded within the dataset are anticipated to be unearthed, thus paving the way for enhanced analytical precision and depth.
Leave a Reply
You must be logged in to post a comment.