Neural Network Diary #6: Building the pairs
Now that we data split into different datasets it is dueling time. As the plan is to look at each race as a bunch of duels between two horses we need to do this pairing up and now that I have the database it makes sense to save this information as well so there is no need to do this pairing up each time the pairs are needed for network teaching purposes.
I am not going to pair each horse in a race with all others but just those that were really challenging the win against all others. And as a criteria for challenging the win I use finishing within three lengths of the winner. This means that that complete database of 20 thousand runs transforms into database of 70 thousand pairs and slightly more respectable dataset for teaching the network.
I started by creating a new table in my database called pairs which has five fields, race_id to refer to race in question and then id of inside runner (the one with lower draw) and id of outside runner. Remaining two fields were reserved for input and output lines which I will go through in the next post.
Code for doing the pairing is pretty simple and I have tried to explain it in the comments below. I have used a completely optional gem for progress bar here to give me some indication on if there is anything happening when running the code.
def create_pairs # Initialize the progress bar progressbar = ProgressBar.create progress = 0 # Get ids for all races to through them one at a time race_ids = Run.all.pluck(:race_id).uniq race_ids.each_with_index do |race_id, index| # Select the top contenders in a race top3 = Run.where(:race_id => race_id, :distance_to_winner => 0..3) top3.each do |top| # Select all other horses from the race opponents = Run.where(:race_id => race_id).where.not(:run_id => top.run_id) opponents.each do |opponent| # Determine which horse is on inside and which is running outside inside = 0 outside = 0 if top.draw < opponent.draw inside = top.run_id outside = opponent.run_id else inside = opponent.run_id outside = top.run_id end # Create the pair in the database and save it pair = Pair.create(:race_id => race_id, :inside_runner => inside, :outside_runner => outside) pair.save! end end # Check if progress has moved one percentage point and increment the bar if so if progress?(index, progress, all) progressbar.increment progress += 1 end end end def progress?(index, progress, all) # Check if there is progress made progress < (index.to_f / all * 100).round ? true : false end
Running this code will create the pairs and save them do database for using later on. And you will notice that I created pairs for all races and not just those that are put in to learning-dataset. I did this with the idea that I might shuffle the data at some point and I dont’t want to have the need for building the pairs again at that possible point in time.
That was all there is this time around and in the next post I am going to build the dataline used as input for one pair.