Neural Network Diary #4: Some more thoughts about inputs and data

Last time I was thinking about how handle the negative values possible in the speed ratings provided by Racing Dossier. Luckily that is not an issue, it is just a matter of using a activation function that supports values -1 to 1. Activation functions available in FANN can be seen here and the ones to use in my case are either

FANN_SIGMOID_SYMMETRIC

 

Symmetric sigmoid activation function, AKA tanh. One of the most used activation functions.

This activation function gives output that is between -1 and 1.

or

FANN_SIGMOID_SYMMETRIC_STEPWISE

 

Stepwise linear approximation to symmetric sigmoid. Faster than symmetric sigmoid but a bit less precise.

This activation function gives output that is between -1 and 1.

And from those I am going to start with the first one. When thinking about this I also had a new idea on how to handle theĀ  presentation of the values. Initially I was planning on using normalised values and two fields, one for each runner. Then I just thought about using the actual values and adjusting them to be between 0 and 1 (or -1 and 1). And now the current idea is that I am going to use only field for each rating and calculate the difference between the ratings there and also using one field for networks output where 1 is when inside horse came ahead and -1 when outside horse came ahead.

Datawise, I have the dataset for the races that I am going to use in development. From 1st of June 2012 to 31st of May 2015. I did exclude maidens and selling or claiming races but have included both handicaps and non handicaps. And as I am concentrating on races ran over lengths less than 8 furlongs I had total of almost 22 000 runs worth of data to use. Next up is dividing them evenly into learning, testing and unseen datasets. So that all courses and all distances are evenly represented in all datasets.