*Numpy Strategies 0.0.2*

I saw a PyCon presentation about pandas. Pandas is a data analysis Python library, which works with timeseries data and handles missing data automatically. It is based on NumPy and should work well together with for instance scikits.statsmodels. The plan for today is:

- Tweak the Numpy1 strategy.
- Create a new portfolio and do basic portfolio analysis with Pandas.
- Profit!

## Strategy tweaks

So last week I found out that sometimes the majority of points fall outside of the bands I defined. As you can see in the snapshots from Google Finance the Numpy1 portfolio is doing fine despite all these issues. Also Google Finance added a new feature, so now we can see how the market value of the portfolio changes over time. The portfolio gained quite a lot this week! This has nothing to do with what happened this week – only with skill and the superiority of the system. Trust me, don’t believe the critics.

Obviously, the new and improved portfolio will perform even better. First, I thought that I should have a look at the ratio of the number of points within the bands and the total number of points. Then it occurred to me that I should calculate the R squared of the fit. When I tried to calculate it, it turned out that the Numpy lstsq function either gives you an empty residuals array or one that leads to a pretty decent R squared. So this became the new constraint – reject empty residuals matrices.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ... (p,residuals,rank,s) = linalg.lstsq(A, tops) a,b=p beforeLastValues = c[0:beforeLast] sigma = std(beforeLastValues) beforeLastVal = c[beforeLast] if beforeLastVal <= (a * beforeLast + b - 2.7 * sigma): if len(residuals) == 0: continue output = [file.replace('.csv','')] devFactor = ( a * beforeLast + b - beforeLastVal ) / sigma output.append( str( devFactor ) ) output.append( str( a ) ) r2 = 1 - residuals/( beforeLast * var( beforeLastValues ) ) output.append( str( r2[0] ) ) output.append( str( within_bands_ratio(a,b,sigma, beforeLastValues) ) ) dailyrets = diff(c)/c[0:len(c) -1] retsigma = std( dailyrets ) ev = expected_value( dailyrets ) output.append( str( retsigma ) ) output.append( str( ev ) ) output.append( str( ev / retsigma ) ) print ','.join( output ) ... |

### Expected value

As an aside I calculated the expected value of the daily returns of the portfolio components. With a bit of persuasion, you can see this as the “reward” and the standard deviation of the returns as the risk. Unfortunately, what I got is a rough approximation at best. It might be necessary to look for a beter alternative. Maybe R offers that, for instance.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ... def expected_value(arr): nbins = len(arr) - 1 p,bins = histogram(arr, bins=nbins,normed=True) last = len(p) - 1 half = p[last] / 2 p[last] = half p = append( p, half) ev = inner(bins, p) return ev ... class ExpectedValueTestCase(unittest.TestCase): def runTest(self): actual = expected_value([1,2,3,4,5,6]) assert actual == 3.5, 'incorrect expected value ' + str(actual) if __name__ == "__main__": unittest.main() ... |

## Pandas correlation

A Panda DataFrame is a matrix and dictionary-like data structure. In fact, it is the central data structure in Pandas and you can apply all kinds of timeseries operations on it. It is quite common to have a look at the correlation matrix of a portfolio. So I did that, although one can argue, that it is a bit pointless. First, I created the DataFrame with Pandas for each symbol’s daily returns. Then I joined these on the date. At the end the correlation was printed and plot shown.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ... for symbol in symbols: dates,close = loadtxt(fileDir + '/' + symbol + '.csv', delimiter=',', usecols=(1,6), unpack=True, converters={1: datestr2num}) last = len( close ) - 1 data = { symbol : diff( close ) / close[ : last ] } newdates = dates[ : last ] dates = Index([datetime.fromordinal(int(d)) for d in newdates]) df = DataFrame(data, index=dates) if len( all ) == 0: all = df else: all = all.join( df ) print all.corr() all.plot() legend(symbols) show() ... |

## A Jedi Numpy2 portfolio (copyrighted story)

“Careful you must be when sensing the future. The fear of loss is a path to the dark side.”

Many, many eons ago, in a galaxy far, far away in a parallel universe a Jedi using the Profit, I mean the Force, created a portfolio of Galactic stocks. His master told him to use a harmonic model screening method and bet on reversion to the mean from a bottom band breakout. The bet size per trade was set to 1000 $. The Jedi was in doubt about this, but his master said:

“Size matters not. Look at me. Judge me by my size do you?”

The Jedi was forced to use discretionary tactics as well by unclean data and the evil compliance rules, in order to convince his fellow Jedi’s that he did not cross over to the dark side of inside trading and necromancy. The stocks were bought at the open on a Friday and at the end of the trading day, these plots were made.

So the portfolio after just one day made enough profit to buy a second hand light saber on the black market. However, the Jedi’s master always warned him not to fear loss as more profit was possible. Well actually what he said was:

“Fear is the path to the dark side. Fear leads to anger, anger leads to hate, hate leads to suffering.”

The Jedi decided to use, what in common parlance is called a Christmas exit strategy. Christmas happened to be just five standard Galactic weeks away. The Jedi also put all the relevant data in a spreadsheet.

Then the Jedi asked his master what he thought about it. The master replied:

“Difficult to see. Always in motion is the future. To the Force, look for guidance. Accept what fate has placed before us.”

May the Profit be with you.

THE END

## Python links of interest

- scikits.learn Machine learning.
- Tabular for tabular data.
- scikit.timeseries
- PyTables manages large hierarchical datasets.

## Random links of interest

- Thalesians thinktank.
- Java Hotspot flags nobody knows. I heard good things about CompileThreshold=1.
- Sawzall language.
- OpenFast
- Jetm – Java Execution Time Measurement Library
- Javasimon – simple monitoring API.