I have stumbled upon an interesting problem about finding decimal dominants in an array. Here are some solutions and explanations.
Given an array with n keys, design an algorithm to find all values that occur more than n/10 times. The expected running time of your algorithm should be linear.
A number in an array is ‘decimal dominant’ if it occurs more than n/10 times in the array. Therefore, the problem is to find all decimal dominants in an array of size n.
Naively, we could just count the number of occurrences of each elements.
- define a hash map from the values to the number of occurrence of those values in the array.
- iterate through the array and populate the hash map.
- iterate through the hash map and get keys with values greater than or equal to n/10.
This takes ~N time and ~N space.
We can optimize the first solution to reduce the memory usage. The key point leading to this optimization is that if the array is of size n, there can be at most 9 elements that occur more than n/10 times. If we suppose that the array had 10 elements that occur more than n/10 times, then we will have more than n elements in the array, which is a contradiction.
With this realization in mind, we can solve the problem using 9 buckets with Boyer-Moore majority vote algorithm.
Basically, we can remember 9 elements along with the number of times they have been seen so far in a loop. When a remembered element is seen, its count is incremented. If an element which is not remembered is seen, we fit it into a slot if one is free. If no slot is free, subtract 1 from all counts. If the count is 0, forget the element.
This takes ~N time and ~1 space.
We can also use a selection algorithm to solve the problem in a linear time. If we imagine the array was sorted in a descending order, we can narrow our candidates to 9 elements, namely (n/10)-th, (2n/10)-th, … (9n/10)-th elements.
In this imaginary sorted version of the array, any elements left to (n/10)-th array cannot occur more than n/10 times because there won’t be enough room.
Using QuickSelect, we can get (n/10)-th largest element from the array without sorting the entire array. After checking if (n/10)-th largest element is decimal dominant, we apply the same procedure to the array including and to the right side of (n/10)-th largest element. This means we check for (2n/10)-th largest element, and so on.
This takes ~N time (9 calls to QuickSelect) and ~1 space.
It was interseting to see how selection algorithm could be applied to solve this kind of problem. I have not really verified any of these solutions. Please let me know if you spot an error.