Recently I have been thinking about inputs that would use in the neural network and as mentioned earlier, most will come from Racing Dossier-service. I don’t wan’t to include too many but then again not too few either. Currently I am planning to include following list of ratings.
- Shorpro – Projected speed rating in todays race
- SpdfigLR – Speed rating in last race
- SHorAvD – Average speed rating at todays race distance
- PFP – Current form class level of horse, this rating starts at 1500
- MClSLr – Money Class Shift From Last Race. Prize money of todays race divided by prize money of last race. Anything greater than 1.07 is a shift up in class, anything less than .93 is a drop in class.
- Raiform – Rating assessing last three races
- Course, Distance or Course/Distance winner
I am still thinking that I might add something measuring how succesfull horse has been when it comes to pricemoney.
Originally I was planning on normalising ratings but that was before I came up with that list and now that I think of it, I might just as well use them as they are and dividing with suitably big number to bring them to less than one. Money Class shift and Course/Distance winner I am putting in as boolean values.
Only problem with that is the fact that speed figures above can be less than zero, I need to find a way to handle that.
Before we get to actually build the neural network I am going to go through the tools that I am planning to use during the project. This list is obviously subject to change but this is what I feel at this point that I will need to complete this.
I will need to do a fair bit of modifying of data and for that I am using Ruby. Naturally one can use any programming language they wish but I am most familiar with Ruby and I like how readable and natural language like the scripts are. When it is relevant I am going to post the code or at least snippets of it in the blog as well. If you are new to Ruby it might be worthwhile to look at this quick start at Ruby official site or this pretty throughout tutorial at Tutorials Point. in the end though, what is needed is pretty simple and beginner level stuff, some calculations and loops mostly.
One could build the Neural Network software from ground up, but I am going to rely on existing library for this purpose. Earlier I have been using AI4R but as I mentioned in my post telling about new version of Raiform I have moved on to FANN or Fast Artificial Neural Network. It seems to be doing a bit better job even with same kind of network topology but what I especially like is feature called Cascade2. It dynamically builds and trains the topology and that is what I used to build the network for Raiform 2.0.
Neural networks are a pretty advanced topic and while it does help if you understand how they work it is till possible to utilize them even if most of the underlying math is left untouched. FANN has Ruby bindings (In addition to several other languages) and I am using Ruby gem called ruby-fann to take advantage of it. FANN has several graphical interfaces as well but I find it a lot easier to work in command line (Command line in windows is pain to work with so be warned or use a proper OS like Linux 🙂 ). If you wish to get a primer about Neural networks you could read for example this.
Last big building block is data. I am going to use data starting from beginning of 2012 and all of my data is originated from Racing Dossier. I have the data in a database so it is easy for me to fetch data with required filters as needed. Actual ratings that I am planning to use I will cover later on. I haven’t decided yet, but it might make sense to build a working database to handle the training and testing data. In the past I have just used csv files for this purpose.
For a while now I have been planning on combining some ideas that I have used in the past and things that I have wanted to learn more about. And I have decided to write a diary of sorts which would serve a dual purpose of documenting this for my own benefit and potentially acting as a tutorial of sorts for others interested in pursuing similar ends.
My plan is pretty simple. I plan to create a neural network and output of that network would be further adjusted with Monte Carlo simulation. End results of this combination should be most likely winner and likelihood for that so that in addition to selection a value price would be calculated for it as well.
I am going to concentrate on 5-7 furlong All Weather races ran in UK and Ireland. Idea is to structure network in a way that pair of runners is modeled as one row of data (This is lifted from old Smartsig article, reference to which I need to dig up). Winner of future race is predicted then by comparing all pairs in the race and finding out which one wins most of these virtual duels.
This is also where I plan to utilize Monte Carlo simulation, so instead of one run through the network I am going to do it ten thousand times, or whatever figures seems like reasonable for the use when I get to that point.
As I am basically learning by doing here I welcome all comments and suggestions any reader might have.