Neural Network Diary #5: Splitting the datasets

As I initially thought I ended up with database route for handling the data. As there is relatively moderate amount of data I decided to use Sqlite database. It is really convenient simple database where data is stored in only one file and it doesn’t require heavy background processes. To make edits to the database I am using Sqlitebrowser, really handy app to build the tables as well as browse the data.

Creating the database and uploading the data is just a matter of choosing the save location for the database file and then importing csv-file as a table. Takes only few minutes (naturally depending on the amount of data).

I am really bad in writing SQL so I tend to use Ruby gem called Active Record. While it is primarily meant to be used with web framework Ruby on Rails it is perfectly usable stand alone as well.

To use use Active Record I am using couple files that I am depending on in my main interaction file. First I have file named dbconfig.rb

require 'rubygems'
require 'active_record'

:adapter => "sqlite3",
:database => "/path/to/database.sqlite3"

I require that file in schema.rb which is used to define models that I am using. Though there isn’t much there yet.

require_relative 'dbconfig'

class Run < ActiveRecord::Base


And finally the file that I used to divide the data to three datasets, three fifths to learning and fifth to both test and unseen. And roughly evenly distributed between all courses and distances as well as time periods.

require_relative 'schema'
race_ids = Run.all.order(course: :asc, distance: :asc, date: :asc).pluck(:race_id).uniq
i = 1 
race_ids.each do |race_id| 
  runners = Run.where(:race_id => race_id) 
  if i < 4 
    runners.update_all(:dataset => "learning") 
    i += 1 
  elsif i < 5 
    runners.update_all(:dataset => "test") 
    i += 1 
    runners.update_all(:dataset => "unseen") 
    i = 0 

This is first time me posting code and I am not really sure how much of is beneficial and how much more people would like to read. Naturally there is metric ton of material available on internet if one wants to learn more about Ruby and/or Active Record. But if you have an opinion, please let me know in the comments.

Handicapping 2.0

I haven’t posted links to other blogs that much but this post at from the other side of the Atlantic is a good one. Not necessarily ground breaking stuff but acts as a good reminder. Coming from US not everything is readily applicable but enough is in my opinion to warrant a read.

It briefly covers some new aspects in addition to basic form, speed, pace and class and author promises to followup with Wagering 2.0 article as well.

On related note, there were some good thoughts about data in recent article at

Neural Network Diary #4: Some more thoughts about inputs and data

Last time I was thinking about how handle the negative values possible in the speed ratings provided by Racing Dossier. Luckily that is not an issue, it is just a matter of using a activation function that supports values -1 to 1. Activation functions available in FANN can be seen here and the ones to use in my case are either



Symmetric sigmoid activation function, AKA tanh. One of the most used activation functions.

This activation function gives output that is between -1 and 1.




Stepwise linear approximation to symmetric sigmoid. Faster than symmetric sigmoid but a bit less precise.

This activation function gives output that is between -1 and 1.

And from those I am going to start with the first one. When thinking about this I also had a new idea on how to handle theĀ  presentation of the values. Initially I was planning on using normalised values and two fields, one for each runner. Then I just thought about using the actual values and adjusting them to be between 0 and 1 (or -1 and 1). And now the current idea is that I am going to use only field for each rating and calculate the difference between the ratings there and also using one field for networks output where 1 is when inside horse came ahead and -1 when outside horse came ahead.

Datawise, I have the dataset for the races that I am going to use in development. From 1st of June 2012 to 31st of May 2015. I did exclude maidens and selling or claiming races but have included both handicaps and non handicaps. And as I am concentrating on races ran over lengths less than 8 furlongs I had total of almost 22 000 runs worth of data to use. Next up is dividing them evenly into learning, testing and unseen datasets. So that all courses and all distances are evenly represented in all datasets.


Results for July Baseball bets

Again one more month has passed and baseball season has turned towards the end, four months behind and three to go. July included all star game break of almost a week which meant a slightly lower number of selections last month.

Anyway, we are back on track and adjusted stakes seem to do the trick, althoug all bets were on profit as well in July.

 All selectionsAdjusted flat stakesAdjusted stakes

I think following chart is quite informative, it shows cumulative P/L on daily level for all bets as well as adjusted stakes.

And we can also see that overall, adjusted stakes have creeped ahead of all selections in early July. This was mainly due to dry spell starting around 22nd of June where all selections P/L dropped to just little more than 6 points. Adjusted stakes suffered as well, but not nearly as much as all selections. Overall adjusted stakes are currently standing at around 5% return on turnover while all selections are at little over 2% return.

All in all, it would seem that Zcode is offering some pretty decent selections and I have started thinking that maybe I should do similar test run with NHL hockey bets during the wintertime.

Neural Network Diary #3: Thoughts about inputs and ratings

Recently I have been thinking about inputs that would use in the neural network and as mentioned earlier, most will come from Racing Dossier-service. I don’t wan’t to include too many but then again not too few either. Currently I am planning to include following list of ratings.

  • Shorpro – Projected speed rating in todays race
  • SpdfigLR – Speed rating in last race
  • SHorAvD – Average speed rating at todays race distance
  • PFP – Current form class level of horse, this rating starts at 1500
  • MClSLr – Money Class Shift From Last Race. Prize money of todays race divided by prize money of last race. Anything greater than 1.07 is a shift up in class, anything less than .93 is a drop in class.
  • Raiform – Rating assessing last three races
  • Course, Distance or Course/Distance winner

I am still thinking that I might add something measuring how succesfull horse has been when it comes to pricemoney.

Originally I was planning on normalising ratings but that was before I came up with that list and now that I think of it, I might just as well use them as they are and dividing with suitably big number to bring them to less than one. Money Class shift and Course/Distance winner I am putting in as boolean values.

Only problem with that is the fact that speed figures above can be less than zero, I need to find a way to handle that.

1 2 3 4 13