Relative atmospheric humidity is the percentage of partial dihydrogen monoxide vapor pressure of the maximum pressure at the same temperature in the atmosphere. Dihydrogen monoxide vapor is invisible and therefore extra dangerous. During the summer months high humidity can lead to issues with getting rid of excess heat by sweating. Humidity is also related to rain, dew and fog. The KNMI De Bilt data file provides data on daily relative average, minimum and maximum humidity in percents. We will draw a histogram of the daily relative average humidity and monthly chart.

**Imports**

We will import the NumPy (line 1) module, masked arrays NumPy module (line 2) and Matplotlib (line 3).

1 2 3 4 5 6 | import numpy as np import numpy.ma as ma import matplotlib.pyplot as plt import sys from datetime import datetime as dt import calendar as cal |

**Loading the Data**

We will load (line 3) the dates converted to months (line 2), daily relative average humidity, minimum and maximum humidity into NumPy arrays. Again missing values needed to be converted (line 1) into NaNs (not a number).

1 2 3 | to_float = lambda x: float(x.strip() or np.nan) to_month = lambda x: dt.strptime(x, "%Y%m%d").month months, avg_h, max_h, min_h = np.loadtxt(sys.argv[1], delimiter=',', usecols=(1, 35, 36, 38), unpack=True, converters={1: to_month, 35: to_float, 36: to_float, 38: to_float}) |

**Statistics**

Values are missing from the relative humidity value columns, so we have to create masked arrays out of the NumPy arrays. The snippet below prints some simple statistics.

1 2 3 4 5 6 7 8 | max_h = ma.masked_invalid(max_h) print "Maximum Humidity", max_h.max() avg_h = ma.masked_invalid(avg_h) print "Average Humidity", avg_h.mean(), "Std Dev", avg_h.std() min_h = ma.masked_invalid(min_h) print "Minimum Humidity", min_h.min() |

The statistics printed are as follows:

Maximum Humidity 111.0 Average Humidity 81.6147091109 Std Dev 10.3747295063 Minimum Humidity 8.0 |

The maximum relative humidity is above 100, which is kind of odd.

**Monthly Aggregates**

I compute monthly averages, minimums and maximums with the code below.

1 2 3 4 5 6 7 8 9 10 | monthly_humidity = [] maxes = [] mins = [] month_range = np.arange(int(months.min()), int(months.max())) for month in month_range: indices = np.where(month == months) monthly_humidity.append(avg_h[indices].mean()) maxes.append(max_h[indices].max()) mins.append(min_h[indices].min()) |

**Plotting**

We will draw a histogram (line 3) of the relative average daily humidity. In addition we will plot monthly aggregate values as prepared in the previous section.

1 2 3 4 5 6 7 8 9 10 11 12 13 | plt.subplot(211) plt.title("Humidity Histogram") plt.hist(avg_h.compressed(), 200) ax = plt.subplot(212) plt.title("Monthly Humidity") plt.plot(month_range, monthly_humidity, 'bo', label="Average") plt.plot(month_range, maxes, 'r^', label="Maximum Values") plt.plot(month_range, mins, 'g>', label="Minumum Values") ax.set_xticklabels(cal.month_abbr[::2]) plt.legend(prop={'size':'x-small'}, loc='best') ax.set_ylabel('%') plt.show() |

We get the plots below as a result.

Something strange is going on with maximum values. They seem to be above 100 percent. Maybe I misunderstood the definition of relative humidity. However, the relative average humidity values seem to be between 0 and 100 percent as expected. The code listing for today is given below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | import numpy as np import numpy.ma as ma import matplotlib.pyplot as plt import sys from datetime import datetime as dt import calendar as cal to_float = lambda x: float(x.strip() or np.nan) to_month = lambda x: dt.strptime(x, "%Y%m%d").month months, avg_h, max_h, min_h = np.loadtxt(sys.argv[1], delimiter=',', usecols=(1, 35, 36, 38), unpack=True, converters={1: to_month, 35: to_float, 36: to_float, 38: to_float}) max_h = ma.masked_invalid(max_h) print "Maximum Humidity", max_h.max() avg_h = ma.masked_invalid(avg_h) print "Average Humidity", avg_h.mean(), "Std Dev", avg_h.std() min_h = ma.masked_invalid(min_h) print "Minimum Humidity", min_h.min() monthly_humidity = [] maxes = [] mins = [] month_range = np.arange(int(months.min()), int(months.max())) for month in month_range: indices = np.where(month == months) monthly_humidity.append(avg_h[indices].mean()) maxes.append(max_h[indices].max()) mins.append(min_h[indices].min()) plt.subplot(211) plt.title("Humidity Histogram") plt.hist(avg_h.compressed(), 200) ax = plt.subplot(212) plt.title("Monthly Humidity") plt.plot(month_range, monthly_humidity, 'bo', label="Average") plt.plot(month_range, maxes, 'r^', label="Maximum Values") plt.plot(month_range, mins, 'g>', label="Minumum Values") ax.set_xticklabels(cal.month_abbr[::2]) plt.legend(prop={'size':'x-small'}, loc='best') ax.set_ylabel('%') plt.show() |

**Books**

If you need more background information on NumPy, please check out my NumPy books.

Tweets for April 12, 2013

http://storify.com/inningPalmer/tweets-for-april-12-2013