Neural Network Diary #8: Getting back in the saddle

Rather lengthy hiatus, sorry about that. Moving to another country and day job requirements has throttled down my betting related activities to bare minimum. Now I hope that I have a little bit more time to invest in this project in the near future.

Let’s get started from where we left of at the end of previous post and start looking at the code I use to instruct FANN to learn. As FANN does most of the heavy lifting we only need to tell it what data to use and how to handle it.

  # input_qty = number of input nodes
  # output_qty = number of input nodes
  # savefile = name of the file network is saved to
  # neurons = how many neurons will be trained by cascade function

def train_cascade(input_qty, output_qty, savefile, neurons)
  # Create the basis for the network, define number of inputs and outputs
  net = RubyFann::Shortcut.new(:num_inputs=> input_qty, :num_outputs=> output_qty)

  # As our inputs and outputs can have a value of -1 to 1 we can only use sigmoid_symmetric
  # and that needs to be specified
  net.set_cascade_activation_functions([:sigmoid_symmetric])
  net.set_activation_function_output(:sigmoid_symmetric)
  
  # Search for pairs to be used for training
  pairs = Pair.where(:race_id => Run.where(:dataset => "learning").where.not(:draw => nil).pluck(:race_id).uniq).where.not(:input => nil)
  
  # I have saved precalculated inputs to the datase as one field, thus it needs splitting
  # before they are usable 
  inputs = []
  pairs.each do |pair|
    inputs << pair.input.split(",").map(&:to_f)
  end

  # And same done to outputs. Output also needs to be an array, even if it is only one value
  outputs = []
  pairs.pluck(:output).each do |o|
    outputs << Array.new(1,o)
  end

  # Once we have arrays of inputs and outputs we can combine them into form that FANN
  # can understand
  train = RubyFann::TrainData.new(:inputs => inputs, :desired_outputs => outputs)
  
  # Finally it is time for some training. This will take a while, depending naturally
  # on how much data one is using.
  net.cascadetrain_on_data(train, neurons, 1, 0.05)

  # After training it is important to remember to save the network into a file
  # for further use
  net.save(savefile)
end

That is pretty simple isn’t it?

Obviously it does need a single line in some another file to actually call this function. Now we can proceed to training and finding out if there actually is something at the end of this exercise.

Neural Network Diary #5: Splitting the datasets

As I initially thought I ended up with database route for handling the data. As there is relatively moderate amount of data I decided to use Sqlite database. It is really convenient simple database where data is stored in only one file and it doesn’t require heavy background processes. To make edits to the database I am using Sqlitebrowser, really handy app to build the tables as well as browse the data.

Creating the database and uploading the data is just a matter of choosing the save location for the database file and then importing csv-file as a table. Takes only few minutes (naturally depending on the amount of data).

I am really bad in writing SQL so I tend to use Ruby gem called Active Record. While it is primarily meant to be used with web framework Ruby on Rails it is perfectly usable stand alone as well.

To use use Active Record I am using couple files that I am depending on in my main interaction file. First I have file named dbconfig.rb

require 'rubygems'
require 'active_record'

ActiveRecord::Base.establish_connection(
:adapter => "sqlite3",
:database => "/path/to/database.sqlite3"
)

I require that file in schema.rb which is used to define models that I am using. Though there isn’t much there yet.

require_relative 'dbconfig'

class Run < ActiveRecord::Base

end

And finally the file that I used to divide the data to three datasets, three fifths to learning and fifth to both test and unseen. And roughly evenly distributed between all courses and distances as well as time periods.

require_relative 'schema'
race_ids = Run.all.order(course: :asc, distance: :asc, date: :asc).pluck(:race_id).uniq
i = 1 
race_ids.each do |race_id| 
  runners = Run.where(:race_id => race_id) 
  if i < 4 
    runners.update_all(:dataset => "learning") 
    i += 1 
  elsif i < 5 
    runners.update_all(:dataset => "test") 
    i += 1 
  else 
    runners.update_all(:dataset => "unseen") 
    i = 0 
  end 
end

This is first time me posting code and I am not really sure how much of is beneficial and how much more people would like to read. Naturally there is metric ton of material available on internet if one wants to learn more about Ruby and/or Active Record. But if you have an opinion, please let me know in the comments.

Neural Network Diary #2: Tools & Data

Before we get to actually build the neural network I am going to go through the tools that I am planning to use during the project. This list is obviously subject to change but this is what I feel at this point that I will need to complete this.

I will need to do a fair bit of modifying of data and for that I am using Ruby. Naturally one can use any programming language they wish but I am most familiar with Ruby and I like how readable and natural language like the scripts are. When it is relevant I am going to post the code or at least snippets of it in the blog as well. If you are new to Ruby it might be worthwhile to look at this quick start at Ruby official site or this pretty throughout tutorial at Tutorials Point. in the end though, what is needed is pretty simple and beginner level stuff, some calculations and loops mostly.

One could build the Neural Network software from ground up, but I am going to rely on existing library for this purpose. Earlier I have been using AI4R but as I mentioned in my post telling about new version of Raiform I have moved on to FANN or Fast Artificial Neural Network. It seems to be doing a bit better job even with same kind of network topology but what I especially like is feature called Cascade2. It dynamically builds and trains the topology and that is what I used to build the network for Raiform 2.0.

Neural networks are a pretty advanced topic and while it does help if you understand how they work it is till possible to utilize them even if most of the underlying math is left untouched. FANN has Ruby bindings (In addition to several other languages) and I am using Ruby gem called ruby-fann to take advantage of it. FANN has several graphical interfaces as well but I find it a lot easier to work in command line (Command line in windows is pain to work with so be warned or use a proper OS like Linux 🙂 ). If you wish to get a primer about Neural networks you could read for example this.

Last big building block is data. I am going to use data starting from beginning of 2012 and all of my data is originated from Racing Dossier. I have the data in a database so it is easy for me to fetch data with required filters as needed. Actual ratings that I am planning to use I will cover later on. I haven’t decided yet, but it might make sense to build a working database to handle the training and testing data. In the past I have just used csv files for this purpose.