Cold Easter

This entry is part 1 of 10 in the series NumPy Weather

It’s supposed to be the coldest Easter since 1964 in the Netherlands. Colder than the previous Christmas. I don’t remember what the temperature was like around Christmas, but I am sure the statisticians are right. By the way it doesn’t feel that cold. A week ago the wind was blowing really hard.

Anyway talking about the weather gets us nowhere, so let’s try something more scientific. I was thinking of researching good data sources and algorithms related to meteorology. Meteorology being the science of atmosphere and weather. I would use the research for NumPy related examples of course.

Data

The Royal Dutch Meteorological Institute, which after translating and abbreviating gives us the abbreviation KNMI, offers daily weather data online here. Googling for “weather data free download” gives a lot of hits. The same for “weather web service free”. I had the Weather Underground website marked in my Evernote notes for having weather data. So data should not be a problem.

Simple statistics

I downloaded one of the KNMI files from the De Bilt weather station. I think that’s where the KNMI head office is. My first programming job was at a consultancy close to the KNMI office. One of the guys used to joke that I should go work at the KNMI as a weatherman. Maybe because of the suit I was wearing on the first day :).

Okay, enough about that. The file is roughly 10 megabytes big. It has some text with explanation about the data in Dutch and English. Below that is the data in comma separated values format. I separated the metadata and the actual data in separate files. The separation is not necessary, because you can skip rows when loading from NumPy.  I wrote a simple script with NumPy to determine the maximum and minimum temperature for the data set from a CSV file that was created in the separation process. The temperatures are given in tenths degrees of Celcius. There are three columns containing temperatures:

  • An average temperature for a 24 hour period.
  • The daily minimum temperature.
  • The daily maximum temperature.

I decided to ignore the average temperatures for now. Also I noticed that there were missing values, so I had to convert them to Nans (not a number). At the end I came up with this simple script:

import numpy as np
import sys

to_float = lambda x: float(x.strip() or np.nan)

#Measurements are in tenths of degrees
min_temp, max_temp = np.loadtxt(sys.argv[1], delimiter=',', usecols=(12, 14), unpack=True, converters={12: to_float, 14: to_float}) * .1
print "# Records", len(min_temp), len(max_temp)
print "Minimum", np.nanmin(min_temp)
print "Maximum", np.nanmax(max_temp)

The script prints the number of records and the minimum and maximum temperature:

# Records 40996 40996
Minimum -24.8
Maximum 36.8

That seems correct to me.

Research Questions

Not being hindered by any meteorological knowledge I want to play with the data and answer some simple questions:

  • What kind of distribution does temperature have?
  • How strong is the correlation of temperature of neighboring weather stations?
  • Is there a correlation between weather and the stock market?
  • Are there simple ways to predict tomorrow’s weather without having a supercomputer?

If you have any tips, ideas or suggestions please let me know.

Series NavigationDaily Temperature Range
By the author of NumPy Beginner's Guide, NumPy Cookbook and Instant Pygame. If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Share
This entry was posted in programming and tagged , . Bookmark the permalink.