Set Intersection in Ruby
How to see commonalities between two data sets using Ruby
Let’s talk about Set Intersection in Ruby.
There are times when you have two or more groups of data and you want to know what is common between them. When this happens it is good to know about set intersection.
Set Intersection
The intersection is the join in two sets of data where data belongs to both sets. It is easiest to describe with a diagram.
The mathematic notion for intersections is ∩. So if you wanted to write the intersection of A and B you would write A ∩ B.
Set Intersection in Ruby
There are a couple of ways you can do intersections in Ruby, I will talk you through both of them. The syntax is similar in both, so don’t worry about having to learn lots of different things.
Set Intersection with Arrays in Ruby
I will cover arrays first because this is one of the most common objects found in Ruby code.
The syntax is array & array
, so much like we had A ∩ B we would do A & B
A working example;
[1, 1, 5, 5] & [1, 2, 5] #=> [ 1, 5 ]
When we do an intersection the array gets treated like a set and in a set everything should be unique. So if we changed the second array to include another 1 we would still get the same result;
[1, 1, 5, 5] & [1, 1, 2, 5] #=> [ 1, 5 ]
The intersection is not destructive, it doesn’t change any arrays it creates a new one.
You can chain intersections together if you want to compare more than two arrays;
[1, 1, 5, 5] & [1, 1, 2, 5] & [5] #=> [ 5 ]
The ordering of the final array is dictated by the order of the first array, so if we change our first example so that the 5’s come first we would get;
[5, 5, 1, 1] & [1, 2, 5] #=> [ 5, 1 ]
This is good to know because there is no point in sorting all the arrays before performing an intersection.
The final thing to say about intersection is it is performant. Under the hood it compares the elements using eql?
with their hash value (not to be confused with a hash object)
Set Intersection with Sets in Ruby
If you are treating your array like a set then maybe it should be a set?
Access to the Set object is just a require away. It acts like a hybrid between the usability of arrays and the speed gains with hashes.
To do the basic intersection with a set in ruby you would do something like;
require "set"
x = Set.new([1,2,3])
y = Set.new([1,2])
x & y #=> #<Set: {1, 2}>
Set also gives you access to a more friendly looking intersection
method so we could have written our final line like;
x.intersection y #=> #<Set: {1, 2}>
Which arguably expresses intent a bit more.
If you enjoyed this article you may enjoy my writeup of what a proc is or what is a gemfile