One of the greatest advantages to state space models is their ability to handle noisy or missing observations. However, standard methods such as those used by the New York Federal Reserve still depend on the fact that, given the frequency of the data, not too many observations are missing. We recently had the opportunity to work on something different: very sparse satellite imagery data. Time series from satellite data tends to be sparse due both to orbital trajectories and due to cloud cover. At a daily frequency, 98% of observations were missing. Even at a monthly frequency, 80% of observations were missing. The exercise provided a great example of how OttoQuant is able to adapt its core state space methodology to quickly meet client’s needs at a fraction of the time and cost it would take to develop these models in-house.
Getting right into the technical details, our data had the following features:
- Observations were flows (values at a point in time), not stocks (totals over an elapsed period)
- Data were highly seasonal
- The resulting monthly index should be updated in real time, but not smoothed
- Monthly averages contained between 1 and 10 observations
To address all these points, we modified a standard dynamic factor model framework to estimate very sparse observations by maximum likelihood using the EM algorithm proposed by Watson and Engle (1983) — the same algorithm used by the New York Fed’s GDP nowcast. One great advantage of this algorithm is that errors in calculations are immediately apparent if the likelihood of the model does not improve at each iteration; a great check to make sure things are coded properly!
Our modifications broke the problem down into two algorithms as follows:
- Construct an initial guess of the model loadings, i.e. the way in which the index relates to observations.
- From the initial guess for loadings, estimate the index by least squares.
- Estimate the low frequency trend and seasonal components of the index; this forms our prior in Algorithm 2.
- Estimate the index via Algorithm 2.
- Repeat steps 3 and 4 for a final estimate of the index.
Algorithm 2 is simply our implementation of the EM algorithm. However, we modified our factor model to account for the fact that monthly averages (when observed) contained between 1 and 10 observations.
- Calculate the index and its variance via the Kalman filter in which we modify the variance of shocks to observations according the number of observations in the monthly average. Because our index is not smoothed, there is no need to then smooth our filtered results.
- Estimate the variance of shocks to factors via Watson and Engle (1983)
- Estimate the loadings and variance of shocks to observations modifying Watson and Engle (1983) to include a weighting matrix based on the number of true observations in monthly averages.
- Repeat steps 1 to 3 until the likelihood function converges.
Following this approach we are able to transform a panel of over 1000 very sparse observations — impossible to interpret in its raw form — into a simple index that conveys the most meaningful information in the data. The index is calculated in real time and is continuously updated as new data becomes available. Results include the seasonally adjusted index, low frequency trends, the seasonally unadjusted index, and variance of the index, which decreases throughout the month as more data become available.