There is more than one way to get data

This entry is part of 23 in the series Numpy Strategies

Numpy Strategies 0.0.1

Happy Halloween, everybody! So the plan for today is

  1. Get as much historical data as possible.
  2. Filter the data with a market scanner.
  3. Profit!!!

Data retrieval

This week I had fun retrieving data with Perl. The script I made, reads a file with symbols and retrieves historical end of day data from the NASDAQ website. I use curl to get the data. I am aware that there are Perl API’s that can do this, but hey that’s just another exercise for the reader :). I read good things about lftp as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#!/usr/bin/perl
 
open SYMBOLS, $ARGV[0] or die "Can't open file: $!\n";
my @symbols = <SYMBOLS>;
my $i = 0;
 
foreach my $symbol (@symbols) {
   chomp($symbol);
 
   $outfile="${symbol}.csv";
   $i++;
 
   if(-e $outfile) {
      print "$i $outfile already exists\n";
      next;
   }
 
   print "$i $outfile\n";
   open(OUTFILE,">$outfile") || die("Cannot Open File");
 
   $data=`curl -s --compressed "http://charting.nasdaq.com/ext/charts.dll?2-1-14-0-0-5120-03NA000000$symbol-&SF:4|5-WD=539-HT=395--XTBL-"`;
   my @lines =  split('\n',$data);
   my @csvLines;
   my $csvLine;
 
   foreach my $line (@lines) {
      if($line =~ /CLASS="DrillDownDate"/) {
         my($MM, $dd, $yyyy) = $line =~ m/(\d{2})\/(\d{2})\/(\d{4})/;
         $csvLine = "$symbol,$dd-$MM-$yyyy, ,";
      }
      if($line =~ /CLASS="DrillDownData"/) {
         if($line =~ /(\d*\.\d*)/) {
            $csvLine = $csvLine."$1,";
         } elsif($line=~ m/(\>[0-9,]*\<)/){
            $val = $1;
            $val =~ s/,//g;
            $val =~ s/\>//g;
            $val =~ s/\<//g;
            $csvLine = $csvLine."$val\n";
            push(@csvLines, $csvLine);
         } elsif($line=~ m/(\>[0-9]*\<)/){
            $val = $1;
            $val =~ s/\>//g;
            $val =~ s/\<//g;
            $csvLine = $csvLine."$val\n";
            push(@csvLines, $csvLine);
         } else {
            $csvLine = "$csvLine0,";
         }
      }
 
   }
 
   print OUTFILE reverse(@csvLines);
   close(OUTFILE);
}

Data checks

Once the data is there, one realizes that the data quality varies, sometimes there are gaps such as a missing high price for instance. Not only that, but for certain equities we have much less data than for others, for obvious reasons. So I made another perl script, that does some sanity checks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
...
opendir (DIR, $directory) or die $!;
my @todelete;
 
while (my $file = readdir(DIR)) {
        if($file =~ /\.csv/) {
           open(FILE,"$directory/$file") || die("Cannot Open File $file $!");
           my @lines = <FILE>;
 
           my $i = 0;
 
           foreach my $line (@lines) {
               $i++;
               my @fields =  split(',',$line);
               chomp($line);
 
               if(!defined($fields[7])) {
                  print "$i $line $file\n";
                  push(@todelete, "$directory/$file");
                  last;
               }
           }
           close(FILE);
        }
}
 
closedir(DIR);
...

Market Scanner

Based on the simple harmonic oscillator model, I created a market scanner with NumPy, that selects based on the premise that “what is down, must come up”. Also obviously – “what is really down, must really come up”. I screened based on:

  1. Number of records. We need lots of data for accurate statistics.
  2. Exclude equities with too much missing data.
  3. Select based on a factor times the standard deviations.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
...
files = os.listdir(fileDir)
 
def calc_k(relchange, arr):
   diffs2 = diff(relchange,n=2)
   ks = []
 
   for i in range(0, len(arr)):
      if arr[i] != 0:
         ks.append(diffs2[i]/arr[i])
 
   return abs(average(ks))
 
def float_converter(x):
   if x == ' ':
      return 0
   else:
      return x
 
def count_zeroes(arr):
   count = 0
 
   for i in range(0, len(arr)):
      if(arr[i] == 0):
         count += 1
 
   return count
 
 
for file in files:
   if(not file.endswith('.csv') ):
      print file + 'skipped'
      continue
 
   o,h,l,c,v = loadtxt(fileDir + '/' +  file, delimiter=',', usecols=(3,4,5,6,7), unpack=True, 
   converters = {3: float_converter, 4: float_converter, 5: float_converter, 6: float_converter})
 
   if len(c) < 660:
      continue
 
   if count_zeroes(c)/len(c) > 0.005:
      continue
 
   indices22 = arange(0,len(c), 22) 
   c22 = take(c, indices22)
   relchange22 = diff(c22)/c22[:len(c22)-1]
   K22 = calc_k(relchange22, relchange22[2:])
 
   if K22 == 0:
      continue
 
   T22 = 2 * pi / sqrt(K22)
 
   dailyPeriod = int(T22 * 22)
 
   if dailyPeriod == 0:
      continue
 
   beforeLast = len(c) -2
   indicesTops = arange(0, beforeLast, dailyPeriod)
   tops = take(c[0:beforeLast], indicesTops)
   A = vstack([indicesTops, ones(len(indicesTops))]).T
   a, b = linalg.lstsq(A, tops)[0]
   sigma = std(c[0:beforeLast])
   beforeLastVal = c[beforeLast]
 
   if beforeLastVal <= (a * beforeLast + b - 2.0 * sigma):
      print file.replace('.csv','') 
...

Here are the results based on different standard deviation factors. As you can see, the higher the factor, the tighter the selection becomes.

And just to make sure I made plots of the 5 survivors/candidates.

numpyStrategies001Charts

So I guess I need to build in a rule, that at least a majority of the points need to be within the bands. For MEMS and ULCM, this is certainly not the case!

Paper trading

I took the results of the screener and created a fantasy portfolio called Numpy1 on Google Finance. The last date in the current dataset is 28 October. The market entries are equal to the open price at the most recent date with approximate bet size of 1000$ for each equity. Below is a screenshot of the portfolio.

numpy1Portfolio

Wow, almost 1% profit. The exit strategy for now is, to wait six weeks and then sell. I will keep you posted.

Random links of interest

If you liked this post and are interested in NumPy check out NumPy Beginner’s Guide by yours truly.

Series NavigationSecret Transitions of a Markov Chain
By the author of NumPy Beginner's Guide, NumPy Cookbook and Instant Pygame. If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Share
This entry was posted in programming and tagged , , , . Bookmark the permalink.