1.3 Clustering time series based on probability distributions across temporal granularities

In Chapter 4, we look at the problem of using clustering to discover patterns in a large number of univariate time series across multiple temporal granularities. Time series clustering research is gaining traction as more data is collected at finer temporal resolution, over longer time periods, and for a larger number of individuals/entities. Many disciplines have noisy, patchy, uneven, and asynchronous time series that make it difficult to search for similarities. We propose a method for overcoming these constraints by calculating distances between time series based on probability distributions at various temporal granularities. Because they are based on probability distributions, these distances are resistant to missing or noisy data and aid in dimension reduction. When fed into a clustering algorithm, the distances can be used to divide large data sets into small pockets of similar repetitive behaviors. These subgroups can then be analyzed separately or used as distinct prototype behaviors in classification problems. The proposed method was tested on a group of residential electricity consumers from the Australian smart meter data set to show that it can generate meaningful clusters. This chapter includes a brief review of the literature on traditional time series clustering and, more specifically, clustering residential smart meter data.