Neural Network Diary #5: Splitting the datasets
As I initially thought I ended up with database route for handling the data. As there is relatively moderate amount of data I decided to use Sqlite database. It is really convenient simple database where data is stored in only one file and it doesn’t require heavy background processes. To make edits to the database I am using Sqlitebrowser, really handy app to build the tables as well as browse the data.
Creating the database and uploading the data is just a matter of choosing the save location for the database file and then importing csv-file as a table. Takes only few minutes (naturally depending on the amount of data).
I am really bad in writing SQL so I tend to use Ruby gem called Active Record. While it is primarily meant to be used with web framework Ruby on Rails it is perfectly usable stand alone as well.
To use use Active Record I am using couple files that I am depending on in my main interaction file. First I have file named dbconfig.rb
require 'rubygems' require 'active_record' ActiveRecord::Base.establish_connection( :adapter => "sqlite3", :database => "/path/to/database.sqlite3" )
I require that file in schema.rb which is used to define models that I am using. Though there isn’t much there yet.
require_relative 'dbconfig' class Run < ActiveRecord::Base end
And finally the file that I used to divide the data to three datasets, three fifths to learning and fifth to both test and unseen. And roughly evenly distributed between all courses and distances as well as time periods.
race_ids = Run.all.order(course: :asc, distance: :asc, date: :asc).pluck(:race_id).uniq
i = 1 race_ids.each do |race_id| runners = Run.where(:race_id => race_id) if i < 4 runners.update_all(:dataset => "learning") i += 1 elsif i < 5 runners.update_all(:dataset => "test") i += 1 else runners.update_all(:dataset => "unseen") i = 0 end end
This is first time me posting code and I am not really sure how much of is beneficial and how much more people would like to read. Naturally there is metric ton of material available on internet if one wants to learn more about Ruby and/or Active Record. But if you have an opinion, please let me know in the comments.