NumPy Strategies 0.1.7
I haven’t blogged in a while, because I am supposed to work on a Big Secret Project (BSP). Obviously, I am not allowed to talk about that. The Product Owner/Manager of our FHF (Fantasy Hedge Funds) has come up with the following User Story:
- Measure the margin of error of a small data sample.
This is about the data that we are using. Our Product Master is worried that we don’t have enough data to do anything meaningful. One way to solve this issue is to apply Statistical Bootstrapping or a type of bootstrapping called Case Resampling. We will apply this method to the problem of computing the mean of the AAPL stock price and the normal distribution.
The steps of the algorithm are:
- Store the empirical distribution from our data.
- Generate random samples from this distribution of the same size as the original sample.
- Calculate and store the means of these samples.
- Determine in which percentile of the means distribution the mean of the original sample lies.
The code on Github and below performs these steps.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | import numpy import sys import matplotlib.pyplot from matplotlib.finance import quotes_historical_yahoo from datetime import date import scipy.stats def random_indices(N): return numpy.random.randint(0, N, N) def random_values(values): return numpy.take(values, random_indices(len(values))) def generate_means(values): NTRIES = int(sys.argv[2]) means = numpy.zeros(NTRIES) for i in xrange(NTRIES): means[i] = random_values(values).mean() return means def format_mean(values): return "Mean=%.3f" % (values.mean()) def plot_percentile(values, means): matplotlib.pyplot.hist(means) percentile = scipy.stats.percentileofscore(means, values.mean()) matplotlib.pyplot.legend([format_mean(means), "Percentile=%.2f" %(percentile)]) def plot(values): matplotlib.pyplot.hist(values) matplotlib.pyplot.legend([format_mean(values)]) today = date.today() start = (today.year - 1, today.month, today.day) quotes = quotes_historical_yahoo(sys.argv[1], start, today) close = numpy.array([q[4] for q in quotes]) close_means = generate_means(close) normal_values = numpy.random.normal(size=len(close)) normal_means = generate_means(normal_values) matplotlib.pyplot.subplot(221) matplotlib.pyplot.title("Close Values") plot(close) matplotlib.pyplot.subplot(222) matplotlib.pyplot.title("Normal Values") plot(normal_values) matplotlib.pyplot.subplot(223) matplotlib.pyplot.title("Close Means") plot_percentile(close, close_means) matplotlib.pyplot.subplot(224) matplotlib.pyplot.title("Normal Means") plot_percentile(normal_values, normal_means) matplotlib.pyplot.show() |
After running the program I get the following plots for the AAPL close price and the Gaussian distribution with 400 generated samples.
If you liked this post and are interested in NumPy check out NumPy Beginner’s Guide by yours truly.




