5.1 Original contributions

Exploratory time series analysis entails numerous iterations of identifying and summarizing temporal dependencies. It is common practice to divide time into years, months, weeks, days, and so on in order to make inferences at both finer and coarser scales. In the literature, the formalization of these temporal deconstructions (granularities) is limited to linear time granularities such as hours, days, weeks, and months that respect the linear progression of time and are non-repeating. Cyclic granularities that are repeating in nature are useful for finding patterns in temporal data. They can be circular, quasi-circular, or aperiodic in nature. Hour of the day and day of the week are examples of circular granularities; the day of the month is an example of a quasi-circular granularity; and public holidays and school holidays are examples of aperiodic granularities. Additionally, time deconstructions can be based on a time hierarchy. Thus, single-order-up granularities such as second of minute or multiple-order-up granularities such as second of hour can be envisioned. The definitions and rules defined in the literature for linear granularities are insufficient for describing various types of cyclic granularities. Chapter 2 provides a formal characterization of cyclic granularities as well as tools for classifying and computing potential cyclic granularities from an ordered temporal index. It also allows for the manipulation of single- and multiple-order-up time granularities via cyclic calendar algebra. The approach is generalizable to non-temporal hierarchical granularities with an ordered index.

Visualizing probability distributions conditional on one or more cyclic granularities is a powerful exploration tool. However, there may be too many cyclic granularities to look at manually for comprehensive exploration, and not all pairs of granularities can be effectively explored together. Chapter 2 also provides a recommendation on whether a pair of granularities can be meaningfully plotted or analyzed together (a “harmony”) or when they cannot (a “clash” or “near-clash”).

Cyclic granularities could be used to create a wide range of displays. And, when there are numerous granularities to choose from, deciding which one to display can be difficult. Moreover, only a few of them may be useful in revealing major patterns. In Chapter 3, the search for informative granularities is facilitated by selecting “significant” granularities. A cyclic granularity is referred to as “significant” if there is a significant distributional difference of the measured variable between different categories. Chapter 3 defines a distance measure to quantify these distributional differences. A higher value of the distance measure for a cyclic granularity or harmony implies that they could be interesting for further investigation, whereas a low value indicates that nothing noteworthy is unfolding. A threshold and, consequently, a selection criterion are chosen using a permutation test such that cyclic granularities with significant values of the distance measure are selected. In addition, the distance metric has been appropriately adjusted, allowing it to be compared not only across cyclic granularities with different numbers of categories but also across a set of time series. As a result, it can also be used to rank the displays according to their ability to capture the greatest amount of variation across one or multiple time series.

The ideas in Chapters 2 and 3 can be used for studying patterns in individual time series or comparing a few time series together. This is extended in Chapter 4 to allow for the exploration of distributions for multiple time series at the same time using unsupervised clustering. In the time series clustering literature, probability distributions across cyclic granularities have not been considered in determining similarity. However, such a similarity measure can be useful for characterizing the inherent temporal data structure of long, unequal-length time series in a way that is resistant to missing or noisy data while allowing for the detection of similar repeated patterns. Chapter 4 proposes two approaches for calculating distances between time series based on probability distributions across cyclic granularities. The first approach considers two time series to be similar if the distributions of each category of one or more cyclic granularities are similar. The second approach considers two time series to be similar if they have a similar significance of patterns across different granularities. A similar significance does not imply a similar pattern, which is where this technique varies from the former. When the distances from these approaches are fed into a hierarchical clustering algorithm, they yield small groups of time series with similar distributions or significance over multiple granularities. Our method is capable of producing useful clusters for both approaches, as demonstrated by testing on a range of validation data designs and a sample of residential smart meter consumers.