What is mixed frequency data?
Mixed frequency data refers to multivariate time series data in which observations a published at different intervals, or even irregular intervals. As a simple example, consider a panel of US data that includes advance retail sales, published once a month, and the Redbook index, published once a week and indexed to Saturday. This panel presents two problems:
- There are a large number of missing observations due to the fact that retail sales is published once a month, and publication dates between series do not align.
- The number of weekly observations in a month is not constant, as the number of Saturdays in a month varies.
The Wrong Approach
Typically, mixed frequency data is handled in one of two ways, neither of which offers an ideal solution. First, high frequency series may be aggregated to the lowest frequency in the data, in this case monthly. This approach isn’t such a bad idea except at the end of the panel. The problem is that in order to have a full month’s worth of weekly observations, we need to wait until the end of the month, which precludes nowcasting. This means we will not be using all available information, and thus are wasting valuable data. Second, missing low frequency values may be imputed. Aside from the fact that fabricating observations is not a good practice (at the very least, the reported variance of your predictions will be misleading), we again run into the problem of the end of the sample. Carrying the last low frequency observation forward to the end of the sample is again misleading — look at the data depicted above, taken from the beginning of the coronavirus crisis. The Redbook index collapses. Carrying robust retail sales numbers forward will not capture current economic conditions!
The Correct Approach
When working with mixed frequency data, we can still choose the frequency of our model. For example, using the above retail sales and Redbook data, our model could be either monthly (the same frequency as retail sales) or weekly (the frequency of the Redbook index). If we wish to model our data at low (monthly) frequency, one approach is to allow each week of observations to enter as a separate variable. For more detail on that approach, have a look at McCracken, Owyang, and Sekhposyan (2015). Alternatively, in a state space framework, we can adjust the variance of inputs to account for the fact that low frequency aggregates of high frequency variables may only be partially observed. For example, if we have observed only two of the four weeks an a month, we would double the variance of that observation. For more details, you can have a look at our own research on the subject here.
Alternatively, again using a state space framework, we can model our monthly/weekly example at a weekly frequency. This has the advantage of capturing higher frequency dynamics in our system and may be more responsive when things change quickly. Doing so requires a rule for mapping our low frequency variable to our high frequency states — the underlying factors that drive our observed dynamics. Implementing these rules can get slightly complicated, particularly when the frequency of a variable is not constant. Luckily, all the details have been built in to OttoQuant’s Bayesian models (our maximum likelihood models are based on code from the NY Fed for replicability and thus do not include this variable frequency feature), so users needn’t worry about details. If you’d like to lean more on the subject, Mariano and Murasawa (2003) is a good place to start.
Do Mixed Frequency Data Right
OttoQuant is built around mixed frequency or variable frequency data, missing observations, ragged edge data (different publication start and end dates), and a host of other issues that pertain to using real world time series. We’re moving data science forward. Join us.