Before we get to actually build the neural network I am going to go through the tools that I am planning to use during the project. This list is obviously subject to change but this is what I feel at this point that I will need to complete this.
I will need to do a fair bit of modifying of data and for that I am using Ruby. Naturally one can use any programming language they wish but I am most familiar with Ruby and I like how readable and natural language like the scripts are. When it is relevant I am going to post the code or at least snippets of it in the blog as well. If you are new to Ruby it might be worthwhile to look at this quick start at Ruby official site or this pretty throughout tutorial at Tutorials Point. in the end though, what is needed is pretty simple and beginner level stuff, some calculations and loops mostly.
One could build the Neural Network software from ground up, but I am going to rely on existing library for this purpose. Earlier I have been using AI4R but as I mentioned in my post telling about new version of Raiform I have moved on to FANN or Fast Artificial Neural Network. It seems to be doing a bit better job even with same kind of network topology but what I especially like is feature called Cascade2. It dynamically builds and trains the topology and that is what I used to build the network for Raiform 2.0.
Neural networks are a pretty advanced topic and while it does help if you understand how they work it is till possible to utilize them even if most of the underlying math is left untouched. FANN has Ruby bindings (In addition to several other languages) and I am using Ruby gem called ruby-fann to take advantage of it. FANN has several graphical interfaces as well but I find it a lot easier to work in command line (Command line in windows is pain to work with so be warned or use a proper OS like Linux 🙂 ). If you wish to get a primer about Neural networks you could read for example this.
Last big building block is data. I am going to use data starting from beginning of 2012 and all of my data is originated from Racing Dossier. I have the data in a database so it is easy for me to fetch data with required filters as needed. Actual ratings that I am planning to use I will cover later on. I haven’t decided yet, but it might make sense to build a working database to handle the training and testing data. In the past I have just used csv files for this purpose.
For a while now I have been planning on combining some ideas that I have used in the past and things that I have wanted to learn more about. And I have decided to write a diary of sorts which would serve a dual purpose of documenting this for my own benefit and potentially acting as a tutorial of sorts for others interested in pursuing similar ends.
My plan is pretty simple. I plan to create a neural network and output of that network would be further adjusted with Monte Carlo simulation. End results of this combination should be most likely winner and likelihood for that so that in addition to selection a value price would be calculated for it as well.
I am going to concentrate on 5-7 furlong All Weather races ran in UK and Ireland. Idea is to structure network in a way that pair of runners is modeled as one row of data (This is lifted from old Smartsig article, reference to which I need to dig up). Winner of future race is predicted then by comparing all pairs in the race and finding out which one wins most of these virtual duels.
This is also where I plan to utilize Monte Carlo simulation, so instead of one run through the network I am going to do it ten thousand times, or whatever figures seems like reasonable for the use when I get to that point.
As I am basically learning by doing here I welcome all comments and suggestions any reader might have.
I have been meaning to write a some kind of description or review of Racing Dossier, the tool that I use as source for all of my data. But author of the tool Michael Wilding has beaten me and made a walk through video of how the tool works.
So, if you are not quite clear what this tool is about or if you have earlier seen the previsous Adobe Air version of the tool and I suggest that you take a look at the new browser based iteration of the tool.
June issue of SmartSigger magazine has been published. If you are a subscriber, check your members area and if you are not, then I would advise you to check it out. There is 30 day trial period which includes access to archive of past issues. This should give you an idea of content available if you were to subscribe.
My article this month is about normalising ratings in order to get a view on how race is shaped for each rating. So answering not only which is better but also by how much it is better.