Neural Network Diary #6: Building the pairs

Now that we data split into different datasets it is dueling time. As the plan is to look at each race as a bunch of duels between two horses we need to do this pairing up and now that I have the database it makes sense to save this information as well so there is no need to do this pairing up each time the pairs are needed for network teaching purposes.

I am not going to pair each horse in a race with all others but just those that were really challenging the win against all others. And as a criteria for challenging the win I use finishing within three lengths of the winner. This means that that complete database of 20 thousand runs transforms into database of 70 thousand pairs and slightly more respectable dataset for teaching the network.

I started by creating a new table in my database called pairs which has five fields, race_id to refer to race in question and then id of inside runner (the one with lower draw) and id of outside runner. Remaining two fields were reserved for input and output lines which I will go through in the next post.

Code for doing the pairing is pretty simple and I have tried to explain it in the comments below. I have used a completely optional gem for progress bar here to give me some indication on if there is anything happening when running the code.

def create_pairs
  # Initialize the progress bar
  progressbar = ProgressBar.create
  progress = 0
  # Get ids for all races to through them one at a time
  race_ids = Run.all.pluck(:race_id).uniq
  race_ids.each_with_index do |race_id, index|
    # Select the top contenders in a race
    top3 = Run.where(:race_id => race_id, :distance_to_winner => 0..3)
    top3.each do |top|
      # Select all other horses from the race
      opponents = Run.where(:race_id => race_id).where.not(:run_id => top.run_id)
      opponents.each do |opponent|
        # Determine which horse is on inside and which is running outside
        inside = 0
        outside = 0
        if top.draw < opponent.draw
          inside = top.run_id
          outside = opponent.run_id
          inside = opponent.run_id
          outside = top.run_id
        # Create the pair in the database and save it
        pair = Pair.create(:race_id => race_id, :inside_runner => inside, :outside_runner => outside)!
    # Check if progress has moved one percentage point and increment the bar if so
    if progress?(index, progress, all)
      progress += 1

def progress?(index, progress, all)
  # Check if there is progress made
  progress < (index.to_f / all * 100).round ? true : false

Running this code will create the pairs and save them do database for using later on. And you will notice that I created pairs for all races and not just those that are put in to learning-dataset. I did this with the idea that I might shuffle the data at some point and I dont’t want to have the need for building the pairs again at that possible point in time.

That was all there is this time around and in the next post I am going to build the dataline used as input for one pair.