2.4 Data structure

Effective exploration and visualization benefit from well-organized data structures. Wang, Cook, and Hyndman (2020a) introduced the tidy “tsibble” data structure to support exploration and modeling of temporal data. This forms the basis of the structure for cyclic granularities. A tsibble comprises an index, optional key(s), and measured variables. An index is a variable with inherent ordering from past to present and a key is a set of variables that define observational units over time. A linear granularity is a mapping of the index set to subsets of the time domain. For example, if the index of a tsibble is days, then a linear granularity might be weeks, months or years. A bottom granularity is represented by the index of the tsibble.

All cyclic granularities can be expressed in terms of the index set. shows the tsibble structure (index, key, measurements) augmented by columns of cyclic granularities. The total number of cyclic granularities depends on the number of linear granularities considered in the hierarchy table and the presence of any aperiodic cyclic granularities. For example, if we have \(n\) periodic linear granularities in the hierarchy table, then \(n(n-1)/2\) circular or quasi-circular cyclic granularities can be constructed. Let \(N_C\) be the total number of contextual circular, quasi-circular and aperiodic cyclic granularities that can originate from the underlying periodic and aperiodic linear granularities. Simultaneously encoding more than a few of these cyclic granularities when visualizing the data overwhelms human comprehension. Instead, we focus on visualizing the data split by pairs of cyclic granularities (\(C_i\), \(C_j\)). Data sets of the form <\(C_i\), \(C_j\), \(v\)> then allow exploration and analysis of the measured variable \(v\).

2.4.1 Harmonies and clashes

The way granularities are related is important when we consider data visualizations. Consider two cyclic granularities \(C_i\) and \(C_j\), such that \(C_i\) maps index set to a set \(\{A_k \mid k=1,\dots,K\}\) and \(C_j\) maps index set to a set \(\{B_\ell \mid \ell =1,\dots,L\}\). Here, \(A_k\) and \(B_\ell\) are the levels/categories corresponding to \(C_i\) and \(C_j\) respectively. Let \(S_{k\ell}\) be a subset of the index set such that for all \(s \in S_{k\ell}\), \(C_i(s) = A_k\) and \(C_j(s) = B_\ell\). There are \(KL\) such data subsets, one for each combination of levels (\(A_k\), \(B_\ell\)). Some of these sets may be empty due to the structure of the calendar, or because of the duration and location of events in a calendar.

Structurally empty combinations can arise due to the structure of the calendar or hierarchy. For example, let \(C_i\) be day-of-month with 31 levels and \(C_j\) be day-of-year with 365 levels. There will be \(31\times 365=11315\) sets \(S_{k\ell}\) corresponding to possible combinations of \(C_i\) and \(C_j\). Many of these are empty. For example, \(S_{1,5}\) is empty because the first day of the month can never correspond to the fifth day of the year. Hence the pair (day-of-month, day-of-year) is a clash.

Event-driven empty combinations arise due to differences in event location or duration in a calendar. For example, let \(C_i\) be day-of-week with 7 levels and \(C_j\) be working-day/non-working-day with 2 levels. While potentially all of these 14 sets \(S_{k\ell}\) can be non-empty (it is possible to have a public holiday on any day-of-week), in practice many of these will probably have very few observations. For example, there are few (if any) public holidays on Wednesdays or Thursdays in any given year in Melbourne, Australia.

An example of harmony is where \(C_i\) and \(C_j\) denote day-of-week and month-of-year respectively. So \(C_i\) will have 7 levels while \(C_j\) will have 12 levels, giving \(12\times7=84\) sets \(S_{k\ell}\). All of these are non-empty because every day-of-week can occur in every month. Hence, the pair (day-of-week, month-of-year) is a harmony.

2.4.2 Near-clashes

Suppose \(C_i\) denotes day-of-year and \(C_j\) denotes day-of-week. While any day of the week can occur on any day of the year, some combinations will be very rare. For example, the 366th day of the year will only coincide with a Wednesday approximately every 28 years on average. We refer to these as “near-clashes.”