Naresh Jain recently posted a refactoring teaser. The original code was in Java, but I thought I'd have a go at refactoring it in Ruby instead. I deliberately didn't look at Naresh's solution beforehand, so mine goes in a rather different direction.
My code's here (with the specs in the same file for simplicity), and you can step through the history to see the state at each step of the refactoring. The links in the text below take you to the relevant version of the file, and the Δs next to them link to the diffs.
Firstly, I converted the code to Ruby, and made sure the tests (which I converted to RSpec) still passed. It's a fairly straight conversion, although I made a couple of changes while I was at it – mainly turning it into a module which I mixed into String. I changed the name of the method to phrases, to avoid conflicting with the built-in String#split. The regular expression to split on is much simpler too, because Ruby didn't understand the original, and I had no idea what it was supposed to do anyway.
Here's my initial Ruby version:
-
#!/usr/bin/env ruby
-
#
-
#See http://blogs.agilefaqs.com/2009/07/08/refactoring-teaser-part-1/
-
-
require 'spec'
-
-
module StringExtensions
-
REGEX_TO_SPLIT_ALONG_WHITESPACES = /\s+/
-
-
def phrases(number)
-
list_of_keywords = ""
-
count = 0
-
strings = split(REGEX_TO_SPLIT_ALONG_WHITESPACES)
-
all_strings = single_double_triple_words(strings)
-
size = all_strings.size
-
all_strings.each do |phrase|
-
break if count == number
-
list_of_keywords += "'" + phrase + "'"
-
count += 1
-
if (count <size && count <number)
-
list_of_keywords += ", "
-
end
-
end
-
return list_of_keywords
-
end
-
-
private
-
-
def single_double_triple_words(strings)
-
all_strings = []
-
num_words = strings.size
-
-
return all_strings unless has_enough_words(num_words)
-
-
# Extracting single words. Total size of words == num_words
-
-
# Extracting single-word phrases.
-
(0...num_words).each do |i|
-
all_strings <<strings[i]
-
end
-
-
# Extracting double-word phrases
-
(0...num_words - 1).each do |i|
-
all_strings <<"#{strings[i]} #{strings[i + 1]}"
-
end
-
-
# Extracting triple-word phrases
-
(0...num_words - 2).each do |i|
-
all_strings <<"#{strings[i]} #{strings[i + 1]} #{strings[i + 2]}"
-
end
-
return all_strings
-
end
-
-
def has_enough_words(num_words)
-
num_words>= 3
-
end
-
end
-
-
String.send(:include, StringExtensions)
-
-
describe StringExtensions do
-
it 'finds all phrases' do
-
'Hello World Ruby'.phrases(6).should == "'Hello', 'World', 'Ruby', 'Hello World', 'World Ruby', 'Hello World Ruby'"
-
end
-
-
it 'returns all phrases when asked for more than exist' do
-
'Hello World Ruby'.phrases(10).should == "'Hello', 'World', 'Ruby', 'Hello World', 'World Ruby', 'Hello World Ruby'"
-
end
-
-
it 'returns the first n phrases when asked for fewer than exist' do
-
'Hello World Ruby'.phrases(4).should == "'Hello', 'World', 'Ruby', 'Hello World'"
-
end
-
-
it 'returns the first word when asked for one phrase' do
-
'Hello World Ruby'.phrases(1).should == "'Hello'"
-
end
-
end
I didn't change the specs at all during the refactoring (because I didn't change the API or add any new public methods or classes), and made sure they all passed at each step.
The first thing Δ I changed was to simplify that big iterator in phrases that loops through the list of phrases, formatting the output string. Basically all this does is to put each phrase in quotes, then stitch them all together separated by a comma and a space. The first of those tasks is a simple map, and the second is a join. The whole method collapses down to this (ruby methods automatically return the result of their last statement):
-
def phrases(number)
-
strings = split(REGEX_TO_SPLIT_ALONG_WHITESPACES)
-
all_strings = single_double_triple_words(strings)
-
all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
-
end
Next Δ I remembered that by default String#split splits at whitespace anyway, so I did away with the regular expression. Then Δ I renamed the strings variable to words to make its purpose a little clearer, leaving the phrases method looking like this:
-
def phrases(number)
-
words = split
-
all_strings = single_double_triple_words(words)
-
all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
-
end
The section of single_double_triple_words that extracted the single words seemed redundant, as we already had that list – the original words. I removed it Δ, and initialised all_strings to the word list instead (not forgetting to rename single_double_triple_words to match its new behaviour):
-
module StringExtensions
-
def phrases(number)
-
words = split
-
all_strings = words
-
all_strings += double_triple_words(words)
-
all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
-
end
-
-
private
-
-
def double_triple_words(strings)
-
all_strings = []
-
num_words = strings.size
-
-
return all_strings unless has_enough_words(num_words)
-
-
# Extracting double-word phrases
-
(0...num_words - 1).each do |i|
-
all_strings <<"#{strings[i]} #{strings[i + 1]}"
-
end
-
-
# Extracting triple-word phrases
-
(0...num_words - 2).each do |i|
-
all_strings <<"#{strings[i]} #{strings[i + 1]} #{strings[i + 2]}"
-
end
-
return all_strings
-
end
-
-
def has_enough_words(num_words)
-
num_words>= 3
-
end
-
end
That has_enough_words method seemed a bit odd – particularly the way it was only called once, rather than after extracting each set of phrases. I decided it was probably a premature and incomplete attempt at optimisation, and removed it Δ for now.
My next target was the duplication in the blocks that calculate double- and triple-word phrases. First Δ I extracted them into separate methods:
-
module StringExtensions
-
def phrases(number)
-
words = split
-
all_strings = words
-
all_strings += double_words(words)
-
all_strings += triple_words(words)
-
all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
-
end
-
-
private
-
-
def double_words(strings)
-
all_strings = []
-
num_words = strings.size
-
-
# Extracting double-word phrases
-
(0...num_words - 1).each do |i|
-
all_strings <<"#{strings[i]} #{strings[i + 1]}"
-
end
-
return all_strings
-
end
-
-
def triple_words(strings)
-
all_strings = []
-
num_words = strings.size
-
-
(0...num_words - 2).each do |i|
-
all_strings <<"#{strings[i]} #{strings[i + 1]} #{strings[i + 2]}"
-
end
-
return all_strings
-
end
-
end
I decided that the num_words variable in the two new methods wasn't really necessary (it was only used once, and I think strings.size expresses intent perfectly clearly), so I inlined it Δ (and the same in triple_words):
-
def double_words(strings)
-
all_strings = []
-
-
# Extracting double-word phrases
-
(0...strings.size - 1).each do |i|
-
all_strings <<"#{strings[i]} #{strings[i + 1]}"
-
end
-
return all_strings
-
end
Most of the code in double_words and triple_words was obviously very similar, so I created Δ a general extract_phrases method, and called it from both. The new method uses the start position and length version of Array#[] to extract the appropriate number of words, then Array#join to string them together separated by spaces:
-
def double_words(strings)
-
extract_phrases(strings, 2)
-
end
-
-
def triple_words(strings)
-
extract_phrases(strings, 3)
-
end
-
-
def extract_phrases(strings, number_of_words)
-
result = []
-
(0...strings.size - number_of_words + 1).each do |i|
-
phrase = strings[i, number_of_words].join(' ')
-
result <<phrase
-
end
-
result
-
end
At this point double_words and triple_words have become just dumb wrappers around extract_phrases, so I removed them Δ and just called extract_phrases directly:
-
def phrases(number)
-
words = split
-
all_strings = words
-
all_strings += extract_phrases(words, 2)
-
all_strings += extract_phrases(words, 3)
-
all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
-
end
Rather than hardcoding the calls for two and three words, I changed it Δ to use a loop:
-
def phrases(number)
-
words = split
-
all_strings = words
-
(2..words.size).each do |number_of_words|
-
all_strings += extract_phrases(words, number_of_words)
-
end
-
all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
-
end
I decided this was the point to put the optimisation back Δ and stop looking for phrases once we had enough:
-
def phrases(number)
-
words = split
-
all_strings = words
-
(2..words.size).each do |number_of_words|
-
break if all_strings.length>= number
-
all_strings += extract_phrases(words, number_of_words)
-
end
-
all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
-
end
I decided that number_of_words wasn't particularly clear, so changed it Δ to phrase_length, then Δ made the iterator in extract_phrases more ruby-like by using map:
-
def extract_phrases(strings, phrase_length)
-
(0...strings.size - phrase_length + 1).map do |i|
-
strings[i, phrase_length].join(' ')
-
end
-
end
I then noticed that I hadn't been consistent in changing strings to words, so fixed that Δ.
Lastly Δ, I decided that even though it was only one line, the code that formats the output deserved pulling out into a method.
Here's my final version of the code:
-
module StringExtensions
-
def phrases(number)
-
words = split
-
all_strings = words
-
(2..words.size).each do |phrase_length|
-
break if all_strings.length>= number
-
all_strings += extract_phrases(words, phrase_length)
-
end
-
format_output(all_strings[0, number])
-
end
-
-
private
-
-
def extract_phrases(words, phrase_length)
-
(0...words.size - phrase_length + 1).map do |i|
-
words[i, phrase_length].join(' ')
-
end
-
end
-
-
def format_output(phrases)
-
phrases.map {|s| "'#{s}'"}.join(', ')
-
end
-
end