Refactoring with Class
28 Oct 2013If you ever want to know if you’re making progress on a skill, go back and look at your earlier work compared to your recent stuff. I was reviewing some of my old scripts on Github and I was surprised how simple and linear they are. They got the job done, but only just barely…
Now I’m not a full time coder, so I haven’t exactly gone pro since then, but I have moved from the ultra simple scripts of yore, to more intelligently formatted and reusable code.
Oh and classes, classes are cool.
Let’s look at a script I wrote not that long ago. It concatenates some CSV files together and gets them into the format I need them.
require 'csv'
courses = "courses.csv"
altered = []
CSV.foreach(courses, { :headers => true, :header_converters => :symbol }) do |course|
altered << course.headers unless altered.size > 0
course[:start_date] += "T01:00:01-6:00" unless course[:start_date].match(/T\d\d:\d\d:\d\d-/)
course[:end_date] += "T22:59:59-6:00" unless course[:end_date].match(/T\d\d:\d\d:\d\d-/)
altered << course
end
CSV.open(courses, "w+") do |csv|
altered.each do |row|
csv << row
end
end
teachers = "enrollments_t.csv"
students = "enrollments.csv"
CSV.open(students, "a") do |csv|
CSV.foreach(teachers, { :headers => true, :header_converters => :symbol }) do |teacher|
csv << teacher
end
end
sections = "sections.csv"
altered = []
CSV.foreach(sections, { :headers => true, :header_converters => :symbol }) do |section|
altered << section.headers unless altered.size > 0
section[:start_date] = ""
section[:end_date] = ""
altered << section
end
CSV.open(sections, "w+") do |csv|
altered.each do |row|
csv << row
end
end
teachers = "users_t.csv"
students = "users.csv"
CSV.open(students, "a") do |csv|
CSV.foreach(teachers, { :headers => true, :header_converters => :symbol }) do |teacher|
csv << teacher
end
end
Oh the horror. It does the job, but it has a lot of smells (and probably more then I even realize.) The most obvious is the duplication. Second (but related) is the complete lack of functions; if I need to concatenate more than just the four files I’d have to extend the script each time. Next is the most up to date version.
require 'csv'
FILE_TYPES = ["courses", "sections", "users", "enrollments"]
class CSVFixer
attr_reader :rows
def initialize(csvs_name)
@csvs, @rows = [], []
@csvs_name = csvs_name
@csvs = get_csvs
read_csvs(@csvs)
end
def get_csvs
@csvs = Dir.glob("csvs/*#{@csvs_name}*", File::FNM_CASEFOLD)
end
def read_csvs(csvs)
csvs.each do |csv|
CSV.foreach(csv, { :headers => true,
:header_converters => :symbol,
:encoding => "windows-1251:utf-8" }) do |row|
@rows << row
end
end
end
def write_csvs(folder, file_name)
CSV.open("#{folder}/#{file_name}.csv", "w",
:write_headers => true, :headers => @rows[0].headers) do |csv|
@rows.each do |row|
remove_section_dates(row) if file_name == "sections"
extend_course_end_date(row) if file_name == "courses" && row[:end_date] == "2013-12-14"
csv << row
end
end
end
def remove_section_dates(row)
row[:start_date] = ""
row[:end_date] = ""
end
def extend_course_end_date(row)
# Horrible magic number change for Fall 2013
row[:end_date] = "2013-12-25"
end
end
def make_folder
folder_name = Time.now.strftime("%Y-%m-%d %H%M%S")
while File.exists?(folder_name)
sleep 1
folder_name = Time.now.strftime("%Y-%m-%d %H%M%S")
end
Dir::mkdir(folder_name)
return folder_name
end
output = make_folder
FILE_TYPES.each do |type|
fix = CSVFixer.new(type)
next if fix.rows.empty?
fix.write_csvs(output, type)
puts "Processed #{fix.rows.length} rows for #{type}.csv"
end
So this version is 20 lines longer, but it adds so much goodness in those 20 lines. First, a class is used to declare and make accessible the commonly needed functions, and a CONSTANT holds the names of files (that are now DIR::glob-ed for). This means that I can now make the entire tool also process groups.csv but just adding "groups"
to the FILE_TYPES
array. I can also concatenate any number of partial files, so if I have users.csv, users1.csv, users2.csv and users3.csv, it will read in all the lines and cram them into one (UTF-8!!) file.
This little tool still isn’t perfect. It assumes the files are formatted correctly (they come from a machine source so that’s fairly likely) and it does use a magic number, but all in all it’s infinitely more extensible and reusable the the original iteration.
The time I spent writing this little utility didn’t just save me many hours of boring (and error prone) manual concatenation of files, but my coworkers got to benefit from using it while I was away from the office.