Neural network diary #7: Converting data to usable form

Now that we have the pairs built it is time to look at the inputs that I am going to use for the network. Let’s start with the main method for dataline and then go through the components in more details.

require_relative 'schema'
require_relative 'helpers'
require_relative 'netbuilder'
require 'ruby-progressbar'

# Select the pairs that are used for learning 
pairs = Pair.where(:race_id => Run.where(:dataset => "learning").where.not(:draw => 0).pluck(:race_id).uniq)

# Initialize progressbar and required variables
@all = pairs.size
@progressbar = ProgressBar.create
@progress = 0

# Loop through all pairs and check for progress
pairs.each_with_index do |pair, index|
  build_dataline(pair)
  progress?(index)
end

@progressbar.finish

As you can see I did the progressbar a bit differently this time, I made variables required by it to be instance variables so that they can be accessed by other methods as well. Also, I am excluding pairs that include a runner that has a draw of zero. This is because there was an error in my data source and draws were missing from races ran in the first half of 2013. Unfortunately they were not in the original imports from data supplier to Racing Dossier either so I need to find an alternate source for those draws as I don’t want to finalize my network training until I have all of the data. Other than those two comments the code is pretty self-explanatory. We choose the pairs and loop through them.

def build_dataline(pair)
  # Select the runs and build the dataline
  inside = Run.find(pair.inside_runner)
  outside = Run.find(pair.outside_runner)
  dataline = dataline_normal(inside, outside)
  pair.input = dataline
  pair.output = determine_output(inside.position, outside.position)
  pair.save!
end

Now that I look at the code above I realize that there is an extra line, I don’t need to first place the dataline into variable dataline before setting pair.input field to hold the contents. Oh well, I guess I can refactor as I go.

def dataline_normal(inside, outside)
  # Declare the array
  dataline = []
  # Convert distance in yards to number less than 1.0, one input
  dataline << inside.distance / 10000.0
  # Check if race is handicap or not, two inputs
  # Handicap = [1,0], non-handicap = [0,1]
  dataline << determine_handicap(inside.handicap)
  # Convert going to thermometer style input, five inputs
  # Slow = [1,0,0,0,0] -> Fast = [1,1,1,1,1]
  dataline << determine_going(inside.going)
  # Check how far away horses are from each other and scale to number less than 1.0, one input
  dataline << diff(inside.draw, outside.draw, 100)
  # Check difference of spdfiglr, one input
  dataline << diff(inside.spdfiglr, outside.spdfiglr, 100)
  # Check difference of shorpro, one input
  dataline << diff(inside.shorpro, outside.shorpro, 100)
  # Check difference of pfp, one input
  dataline << diff(inside.pfp, outside.pfp, 100)
  # Check difference of shoravd, one input
  dataline << diff(inside.shoravd, outside.shoravd, 100)
  # Check difference of raiform, one input
  dataline << diff(inside.raiform, outside.raiform, 1000)
  # Check if runners are moving up or down in money class, four inputs
  # No movement = [0,0,0,0] Inside up, outside down = [1,0,0,1]
  dataline << determine_mcls(inside.mclslr, outside.mclslr)
  # Check difference of acecl, one input
  dataline << diff(inside.acecl, outside.acecl, 1)
  # Check if either runner is a course and/or distance winner or same race winner, eight inputs
  # [c, d, cd, cds] + [c, d, cd, cds]
  dataline << determine_cdwinner(inside.cdwinner, outside.cdwinner)
  # Remove any sub arrays from dataline array
  dataline.flatten.join(",")
end

As much comments as there is code 🙂 Basically I create the inputs one by one by either calculating the difference between the two ratings as explained in an earlier post or creating an array of ones and zeros and appending the results into array dataline. Below you can see the helper methods used in the above piece of code.

def determine_handicap(handicap)
  o = Array.new(2,0)
  if handicap || handicap == "true"
    o[0] = 1
  else
    o[1] = 1
  end
  o
end

def determine_going(going)
  arr = Array.new(5, 0)
  goings = ["Slow","Standard To Slow","Standard","Standard To Fast","Fast"]
  (0..goings.index(going)).each do |i|
    arr[i] = 1
  end
  arr
end

def diff(inside, outside, divisor)
  o = begin (inside - outside) / divisor.to_f rescue 0 end
  if o > 1
    1
  elsif o < -1
    -1
  else
    o
  end
end

def determine_mcls(inside, outside)
  o = Array.new(4,0)
  unless inside.nil?
    inside > 1.07 ? o[0] = 1 : 0
    inside < 0.93 ? o[1] = 1 : 0
  end
  unless outside.nil?
    outside > 1.07 ? o[2] = 1 : 0
    outside < 0.93 ? o[3] = 1 : 0
  end
  o
end

def determine_cdwinner(inside, outside)
  o = Array.new(8,0)
  inside > 0 ? o[inside - 1] = 1 : 0
  outside > 0 ? o[outside + 3] = 1 : 0
  o
end

def determine_output(inside, outside)
  if inside < outside
    1
  else
    -1
  end
end

def progress?(index)
  # Check if there is progress made
  if @progress < (index.to_f / @all * 100).round
    @progressbar.increment
    @progress += 1
  end
end

These are pretty simple stuff, basically some calculations that I prefer not to repeat and / or is nice to keep out from the main blocks of code. That was it for this time. In the next post we actually start looking at the code used in training the network or at least the code were I tell FANN to learn 🙂

Racing Dossier review

I realised that I haven’t written a product review of a tool that I am using daily in my betting, namely the Racing Dossier. I have mentioned it before and even linked to a walk through that show a little of what it is about.

I ended up checking what other reviewers are saying about the product but there isn’t much there and what little there is missing the point of the Racing Dossier in my opinion. For example this review at Betting System Truths is looking at couple example race cards and looking at their profitability while they were never intended as plug’n’play systems that one just takes and uses without thinking about it at all.

So if they missed the point, what is it about? It is a collection of ratings, that’s what. And how any single punter uses those ratings determines the profits or losses incurred. So it does need effort on the users part to make it work but so does everything that has even remotest chances of success.

Earlier the access to ratings was provided with a Adobe Air app but earlier this year they rolled a new web based version of the system which improved on the usability front. System is organized around race cards that you create by yourself. Basically race cards are a collection of ratings of your choosing that you use to analyse races. Below is an screenshot showing an example with only handful of ratings.

Racing Dossier exampleAll in all there are close to 700 different rating to take advantage of. Granted, bunch of those  are ones that I would call helper ratings, for example those that show difference from top rated. As an example, there is rating called PFP (Current form class level of horse, this rating starts at 1500) which is Glicko/Elo derived form rating and it is accompanied by DiffTpPFP or PFP Difference From Top Rated – Current form class level of horse as the official description goes. And it does get more complicated than that for example there is rating DiffTpBEPFP which is BEPFP Difference From Top Rated – Best ever form based PFP rating horse has achieved. All in all there are 152 ratings that begin with Diff meaning that they are some kind of difference from some base rating.

In addition to form rating like PFP there are also big variety of approaches to estimating speed characteristics of horses. There is SPDFIGLr or Speed rating in last race but there is also SHorPro or Projected speed rating in todays race or SHorBE30 for Best speed rating achieved in the last 30 days. I think you get the point. Naturally different looks at jockey and trainer strike rates are included as well. All they way from Percentage of winners trainer has had to Percentage of wins jockey/trainer have had at todays course and distance.

Many of the ratings also have ranking counterpart. For example PFP is paired up with RnkPFP which simply is horses ranking in a race based on its PFP score. In total there are 152 ranking ratings included same way as with Diff-ratings.

I don’t have success rates of different ratings at hand and in any case it really depends on the race niche you are looking at. Constant advice at the forums for newcomers is to really narrow it down to concentrate on the subset of subset and to become specialist on that before expanding. But a while back I did take a look at how couple different speed ratings stack up on All Weather racing. Take a look at that comparison here.

And, analysing races one at a time is not the only way to use Racing Dossier. You could do as I do and download a csv export of next days races and import them to the database of your own and do analysis there. Naturally previous days results are downloadable as well, there are occasional hiccups with results, especially BSP’s and sometimes it takes the day after to have results available. One can also look at that on winter meeting at Lingfield in 2013 if one so wishes. Exporting is also possible over maximum of 30 days if one wants to have a bit more reference data.

To summarize, Racing Dossier is not for everyone as it does take the effort of learning to utilize the information. Luckily Race Advisor forums are helpful and questions and inquiries are answered by other users as well as staff. But if you invest the time the rewards are there and it is well worth the price of £49.75 per quarter

I have been using Racing Dossier for close to two years now and I am not planning on stopping anytime soon if that is of any indication on how I feel about it. And hopefully this review also gave you the reader a slightly better idea about what the system is all about.

You can get access to Racing Dossier by clicking this link.

Disclosure. I write for Smartsigger magazine which I am compensated for and Smartsigger is published by same company as Racing Dossier. And that link above is an affiliate link.

Neural Network Diary #6: Building the pairs

Now that we data split into different datasets it is dueling time. As the plan is to look at each race as a bunch of duels between two horses we need to do this pairing up and now that I have the database it makes sense to save this information as well so there is no need to do this pairing up each time the pairs are needed for network teaching purposes.

I am not going to pair each horse in a race with all others but just those that were really challenging the win against all others. And as a criteria for challenging the win I use finishing within three lengths of the winner. This means that that complete database of 20 thousand runs transforms into database of 70 thousand pairs and slightly more respectable dataset for teaching the network.

I started by creating a new table in my database called pairs which has five fields, race_id to refer to race in question and then id of inside runner (the one with lower draw) and id of outside runner. Remaining two fields were reserved for input and output lines which I will go through in the next post.

Code for doing the pairing is pretty simple and I have tried to explain it in the comments below. I have used a completely optional gem for progress bar here to give me some indication on if there is anything happening when running the code.

def create_pairs
  # Initialize the progress bar
  progressbar = ProgressBar.create
  progress = 0
  # Get ids for all races to through them one at a time
  race_ids = Run.all.pluck(:race_id).uniq
  race_ids.each_with_index do |race_id, index|
    # Select the top contenders in a race
    top3 = Run.where(:race_id => race_id, :distance_to_winner => 0..3)
    top3.each do |top|
      # Select all other horses from the race
      opponents = Run.where(:race_id => race_id).where.not(:run_id => top.run_id)
      opponents.each do |opponent|
        # Determine which horse is on inside and which is running outside
        inside = 0
        outside = 0
        if top.draw < opponent.draw
          inside = top.run_id
          outside = opponent.run_id
        else
          inside = opponent.run_id
          outside = top.run_id
        end
        # Create the pair in the database and save it
        pair = Pair.create(:race_id => race_id, :inside_runner => inside, :outside_runner => outside)
        pair.save!
      end
    end
    # Check if progress has moved one percentage point and increment the bar if so
    if progress?(index, progress, all)
      progressbar.increment
      progress += 1
    end
  end
end

def progress?(index, progress, all)
  # Check if there is progress made
  progress < (index.to_f / all * 100).round ? true : false
end

Running this code will create the pairs and save them do database for using later on. And you will notice that I created pairs for all races and not just those that are put in to learning-dataset. I did this with the idea that I might shuffle the data at some point and I dont’t want to have the need for building the pairs again at that possible point in time.

That was all there is this time around and in the next post I am going to build the dataline used as input for one pair.