Chapter 1 Introduction
The Smart Grid, Smart City (SGSC) project (2010–2014) available through the Department of the Environment and Energy provides half-hourly data of over 13,000 Australian electricity smart meter customers distributed unevenly from October 2011 to March 2014. The wide variety of customers means that there will be large variance in behavior, leading to greater uncertainty in the data. Behavioral patterns vary significantly due to differences in size, location, and amenities such as solar panels, central heating, and air conditioning. For example, some families use a dryer, while others hang their clothes to dry. This could be reflected in their weekly profile. They may vary on a monthly basis, with some customers using more air conditioners or heaters than others despite having comparable electrical equipment and weather conditions. Some customers are night owls, while others are morning larks, which may show up in their daily profile. Customers’ day-off energy consumption varies depending on whether they stay at home or go outside.
With the availability of data at finer and finer time scales, exploration of time series data may be required to be carried out across both finer and coarser scales to draw useful inferences about the underlying process. To reduce the complexity of time, it is typical to divide it into years, months, weeks, days, and so on in a hierarchical manner (Aigner et al. 2011). These discrete abstractions of time are known as time granularities. Linear time granularities (Bettini et al. 1998) ,such as hours, days, weeks and months, respect the linear progression of time and are non-repeating. Cyclic temporal granularities representing cyclical repetitions in time (such as hour-of-the-day, work-day/weekend) are effective for analyzing repetitive patterns in time series data.
To acquire a comprehensive view of the repeated patterns, it is necessary to navigate through all of the conceivable cyclic granularities. This approach is consistent with the concept of EDA (Tukey 1977), which stresses the utilization of multiple perspectives on data to assist with formulating hypotheses before proceeding to hypothesis testing. This, however, is a challenging process since it throws up a myriad of alternatives. Furthermore, the transition from linear to cyclic granularities results in restructured data, with each level of the temporal deconstruction corresponding to multiple values of the observed variable. This motivates the research presented in this thesis, which aims to provide a platform for systematically exploring probability distributions induced by these multiple observations to support the discovery of regular patterns or anomalies, as well as the exploration of clusters of behaviors or the summarization of the behavior.