So now we have two ideas. Either the temperature today depends on the temperature yesterday and the day before yesterday. And we assume that some kind of linear combination is formed. Or temperature depends on the day of the year (between 1 and 366). A quadratic polynomial seemed the best fit for this idea. We can combine those ideas, but then the question is how. It seems that we could have a multiplicative model or an additive model.
Let’s choose the additive model, since it seems simpler. This means that we assume that temperature is the sum of the autoregressive component and a cyclical component. It’s easy to write this down into one equation. We will use theĀ SciPy leastsq function to minimize the square of the error of this equation. Here is the code to achieve this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | import sys import numpy as np import matplotlib.pyplot as plt from datetime import datetime as dt from scipy.optimize import leastsq to_dayofyear = lambda x: dt.strptime(x, "%Y%m%d").timetuple().tm_yday days, temp = np.loadtxt(sys.argv[1], delimiter=',', usecols=(1, 11), unpack=True, converters={1: to_dayofyear}) temp = .1 * temp cutoff = 0.9 * len(temp) def error(p, d, t, lag2, lag1): l2, l1, d2, d1, d0 = p return t - l2 * lag2 + l1 * lag1 + d2 * d ** 2 + d1 * d + d0 p0 = [-0.08293789, 1.06517683, -4.91072584e-04, 1.92682505e-01, -3.97182941e+00] params = leastsq(error, p0, args=(days[2:cutoff], temp[2:cutoff], temp[:cutoff - 2], temp[1 :cutoff - 1]))[0] print params delta = np.abs(error(params, days[cutoff+1:], temp[cutoff+1:], temp[cutoff-1:-2], temp[cutoff:-1])) plt.hist(delta, bins = 10, normed = True) plt.show() |
- Line 12 – 15 define a function, that computes the error of our model.
- Line 17 gives an initial guess for all the parameters in our equation.
- Line 18 shows the leastsq function in action.
- Line 20 calculates the absolute error for the model applied above the cutoff point.
- Line 22 plots the histogram of the error.
The final parameters of the model are printed below. It looks like all parameters except the first one have decreased in absolute size. I don’t know if that’s coincidental, but as far as I know the order of the parameters shouldn’t matter.
[ -1.52297691e-01 -9.89195783e-01 8.20879954e-05 -3.16870659e-02 6.06397834e-01] |
The accuracy of the model doesn’t seem to be better than the simple autoregressive model with lag 2.




