Grouping things together in Ruby

By | January 29, 2017

Say are writing a Ruby program and you need to group elements in a collection following some policy.

Lets start with a simple example. We have a collection of numbers and we want to separate them into odd and even.

numbers = [1, 2, 6, 100, 7, 32, 56, 3, 11, 8, 654, 123, 543]

In order to group them we use the group_by method, provided y the Enumerable module.

This method accepts a block, where we define the condition for each element, and returns a hash. Lets see it in action

numbers = [1, 2, 6, 100, 7, 32, 56, 3, 11, 8, 654, 123, 543]

numbers.group_by { |n| n.odd? ? :odd : :even }
# => {:odd=>[1, 7, 3, 11, 123, 543], :even=>[2, 6, 100, 32, 56, 8, 654]}

The returned hash has the returned values as keys and an arrays of results matching the provided condition as the values.

Now say we have some text and we want to group it’s words by length. The mechanics are very similar.

text  = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
words = text.split(/\W+/)
# => ["Lorem",
#     "ipsum",
#     "dolor",
#     "sit",
#     "amet",
#     "consectetur",
#     "adipiscing",
#     "elit"]

words.group_by { |word| word.size }
# => {5=>["Lorem", "ipsum", "dolor"],
#     3=>["sit"],
#     4=>["amet", "elit"],
#     11=>["consectetur"],
#     10=>["adipiscing"]}

First we get all the words by splitting the string by any non-word characters.
Then we pass a block to group_by and we return it’s number of characters using the size method.

We can also use symbol to proc here

words.group_by(&:size)
# => {5=>["Lorem", "ipsum", "dolor"],
#     3=>["sit"],
#     4=>["amet", "elit"],
#     11=>["consectetur"],
#     10=>["adipiscing"]}

And it even reads beautifully: “Words group by size.”. Ok, it’s not perfect English but it explains what the code is doing very well.

That’s the way you can group anything in Ruby.

We could apply this approach in a variety of situations. Lets list a couple of them:

Categorization

We have an array of products and we want to categorize them.

products_by_category = products.group_by { |product| product.category.to_s}
# => {"Ships"=>[Product[TARDIS], Product[SS Madame de Pompadour]],
#     "Tools"=>[Product[Sonic Screwdriver]],
#     "Weapons"=>[Product[Katana], Product[Light Saber]],
#     "Abilities"=>[Product[Regeneration Factor], Product[X-Ray Vision]]}


products_by_category.each do |category, products|
  puts "* #{category}: "
  products.each do |product|
    puts "   - #{product}"
  end
end

# >> * Ships: 
# >>    - TARDIS
# >>    - SS Madame de Pompadour
# >> * Tools: 
# >>    - Sonic Screwdriver
# >> * Weapons: 
# >>    - Katana
# >>    - Light Saber
# >> * Abilities: 
# >>    - Regeneration Factor
# >>    - X-Ray Vision

Here we even formatted the output to make it human-readable

Words with similar distances

The motivation for this post came from this StackOverflow question.

TL:DR, he needed to group words by shortest word distance. He provided a list of gems that were able to calculate the distance, but not group by it.

For this, I chose one of those gems randomly (fuzzy-string-match)

Here’s how to use the gem:

require 'fuzzystringmatch'

# Create the matcher
jarow = FuzzyStringMatch::JaroWinkler.create( :native )

# Get the distance
jarow.getDistance(  "jones",      "johnson" )
# => 0.8323809523809523

# Round it
jarow.getDistance(  "jones",      "johnson" ).round(2)
# => 0.83

And then, since it returns a float, we can use that as the criteria for grouping t, even providing the desired precision.

require 'fuzzystringmatch'

jarow = FuzzyStringMatch::JaroWinkler.create( :native )

target = "jones"
precision = 2
candidates = [ "Jessica Jones", "Jones", "Johnson", "thompson", "john", "thompsen" ]

distances = candidates.group_by { |candidate|
  jarow.getDistance( target, candidate ).round(precision)
}

distances
# => {0.52=>["Jessica Jones"],
#     0.87=>["Jones"],
#     0.68=>["Johnson"],
#     0.55=>["thompson", "thompsen"],
#     0.83=>["john"]}

Sharing is caring!

Leave a Reply