There are many types of models available to us, (you’ve already seen a lot now):
A range of choices from mechanistic to purely statistical.
Bayesian, frequentist, machine learning.
Stochastic vs. Deterministic
How should we decide which model(s) to incorporate for nowcasting and forecasting?
What outputs and features do we require of these models?
Often, what motivates one’s modeling choices are the trends and patterns observed in the data.
Autocorrelation and Partial Autocorrelation (ACF/PACF) plots measure the linear relationship between lagged values of a time series.
ACF plots correlation of \(y_t\) and \(y_{t-k}\) as a function of k.
Correlations show periodicity of ~52 weeks.
PACF plots correlation of \(y_t\) and \(y_{t-k}\) as a function of k, adjusted for shorter lags.
Only a few partial correlations are significant.
A time series is stationary if the mean and variance of the data do not change over time.
Stationarity means that there are no long-term predictable patterns in a time-series, but there can still be predictability in the short term.
ARIMA (and other models) assume data are stationary (won’t work as well if data are not).
Seasonal patterns are common with many (but not all) endemic diseases. Seasonality can make a disease more predictable, but also means it’s not stationary.
A key assumption of many models is constant variance over time. Transformations often help.
Widely used for decades!
Notation is \(ARIMA(p, d, q).\)
Figuring out which \((p,d,q)\) to use is tricky!
Improving Forecasting Models