Section 2.4 An Anagram Detection Example

A good example problem for showing algorithms with different orders of magnitude is the classic anagram detection problem for strings. One string is an anagram of another if the second is simply a rearrangement of the first. For example, heart and earth are anagrams. The strings taster and treats are anagrams as well. For the sake of simplicity, we will assume that the two strings in question are of equal length and that they are made up of symbols from the set of 26 lowercase alphabetic characters. Our goal is to write a boolean function that will take two strings and return whether they are anagrams.

Subsection 2.4.1 Solution 1: Anagram Detection Checking Off

Our first solution to the anagram problem will check the lengths of the strings and then check to see that each character in the first string actually occurs in the second. If it is possible to check off each character, then the two strings must be anagrams. Checking off a character will be accomplished by setting an element to - if it is found (it’s not a letter, so it won’t ever match letters in the first string).

However, since strings in Java are immutable, the first step in the process will be to convert the second string to an array of characters with the data type char. The char data type stores exactly one character, and we can use == to compare them directly (unlike Strings).

We will get each character from the first string by using the charAt() method and check it against the characters in the array formed from the second string. If a letter is found, we check it off by setting the array element to -. Listing 2.4.1 shows this method.

Listing 2.4.1.

public class Anagrams1 {

public static boolean anagramSolution1(String s1, String s2) {
        boolean isAnagram = true;

if (s1.length() != s2.length()) {
            isAnagram = false;
        } else {
            int pos1 = 0;
            char[] s2Array = s2.toCharArray();
            while (pos1 < s1.length() && isAnagram) {
                int pos2 = 0;
                boolean found = false;
                while (pos2 < s2.length() && !found) {
                    if (s1.charAt(pos1) == s2Array[pos2]) {
                        found = true;
                    } else {
                        pos2 = pos2 + 1;
                    }
                }
                if (found) {
                    s2Array[pos2] = '-';
                } else {
                    isAnagram = false;
                }
                pos1 = pos1 + 1;
            }
        }
        return isAnagram;
    }

public static void main(String[] args) {
        System.out.println(anagramSolution1("taster", "treats")); // expected: true
        System.out.println(anagramSolution1("abcd", "dcab")); // expected: true
        System.out.println(anagramSolution1("abcd", "dcda"));  // expected: false
    }
}

To analyze this algorithm, we need to note that each of the n characters in s1 will cause an iteration through up to n characters in the list from s2. Each of the n positions in the s2 array will be visited once to match a character from s1. The number of visits then becomes the sum of the integers from 1 to n. We stated earlier that this can be written as

\begin{align*} \sum_{i=1}^{n} i &= \frac {n(n+1)}{2}\\ &= \frac {1}{2}n^{2} + \frac {1}{2}n \end{align*}

As \(n\) gets large, the \(n^{2}\) term will dominate the \(n\) term and the \(\frac {1}{2}\) can be ignored. Therefore, this solution is \(O(n^{2})\text{.}\)

This is a worst-case analysis, and it happens when the strings are anagrams—the algorithm has to compare every pair of characters. If the words are not anagrams, the algorithm will exit before all the comparisons are done (as soon as a mismatch is found).

Subsection 2.4.2 Anagram Detection Solution 2: Sort and Compare

Another solution to the anagram problem will make use of the fact that even though s1 and s2 are different, they are anagrams only if they consist of exactly the same characters. So if we begin by sorting each string alphabetically from a to z, we will end up with the same string if the original two strings are anagrams. Listing 2.4.2 shows this solution. Again, in Java we can use a sort method on arrays by converting each string to an array at the start.

Listing 2.4.2.

At first glance you may be tempted to think that this algorithm is \(O(n)\text{,}\) since there is one simple iteration to compare the n characters after the sorting process. However, the two calls to the Java sort method are not without their own cost. As we will see in Chapter 5, sorting is typically either \(O(n^{2})\) or \(O(n\log n)\text{,}\) so the sorting operations dominate the iteration. In the end, this algorithm will have the same order of magnitude as that of the sorting process.

Subsection 2.4.3 Anagram Detection Solution 3: Brute Force

A brute force technique for solving a problem typically tries to exhaust all possibilities. For the anagram detection problem, we can simply generate a list of all possible strings using the characters from s1 and then see if s2 occurs. However, there is a problem with this approach. When generating all possible strings from s1, there are n possible first characters, \(n - 1\) possible characters for the second position, \(n - 2\) for the third, and so on. The total number of candidate strings is \(n \cdot (n - 1) \cdot (n - 2) \cdot ... \cdot 3 \cdot 2 \cdot 1\text{,}\) which is \(n!\text{.}\) Although some of the strings may be duplicates, the program cannot know this ahead of time and so it will still generate \(n!\) different strings.

It turns out that \(n!\) grows even faster than \(2^{n}\) as n gets large. In fact, if s1 were 20 characters long, there would be \(20! = 2,432,902,008,176,640,000\) possible candidate strings. If we processed one possibility every second, it would still take us 77,146,816,596 years to go through the entire list. This is probably not going to be a good solution.

Subsection 2.4.4 Anagram Detection Solution 4: Count and Compare

Our final solution to the anagram problem takes advantage of the fact that any two anagrams will have the same number of a’s, the same number of b’s, the same number of c’s, and so on. In order to decide whether two strings are anagrams, we will first count the number of times each character occurs. Since there are 26 possible characters, we can use an array of 26 counters, one for each possible character. Each time we see a particular character, we will increment the counter at that position. In the end, if the two lists of counters are identical, the strings must be anagrams. We will take advantage of the fact that char variables in Java are stored as integers, and Java allows us to evaluate an expression like 'c' - 'a', which evaluates to 2. Listing 2.4.3 shows this solution.

Listing 2.4.3.

import java.util.Arrays;

public class Anagrams4 {

public static boolean anagramSolution4(String s1, String s2) {
        int[] count1 = new int[26]; // initialized to all zeros
        int[] count2 = new int[26];

for (int i = 0; i < s1.length(); i++) {
            int index = s1.charAt(i) - 'a';
            count1[index] = count1[index] + 1;
        }

for (int i = 0; i < s2.length(); i++) {
            int index = s2.charAt(i) - 'a';
            count2[index] = count2[index] + 1;
        }

int j = 0;
        boolean isAnagram = true;
        while (j < 26 && isAnagram) {
            if (count1[j] == count2[j]) {
                j = j + 1;
            } else {
                isAnagram = false;
            }
        }
        return isAnagram;
    }

public static void main(String[] args) {
        System.out.println(anagramSolution4("taster", "treats")); // expected: true
        System.out.println(anagramSolution4("abcd", "dcab")); // expected: true
        System.out.println(anagramSolution4("abcd", "dcda"));  // expected: false
    }
}

Again, the solution has a number of iterations. However, unlike the first solution, none of them are nested. The first two iterations used to count the characters are both based on n. The third iteration, comparing the two lists of counts, always takes 26 steps since there are 26 possible characters in the strings. Adding it all up gives us \(T(n)=2n+26\) steps. That is \(O(n)\text{.}\) We have found a linear order of magnitude algorithm for solving this problem.

Before leaving this example, we need to say something about space requirements. Although the last solution was able to run in linear time, it could only do so by using additional storage to keep the two lists of character counts. In other words, this algorithm sacrificed space in order to gain time.

This is a common occurrence. On many occasions you will need to make decisions between time and space trade-offs. In this case, the amount of extra space is not significant. However, if the underlying alphabet had millions of characters, there would be more concern. As a computer scientist, when given a choice of algorithms, it will be up to you to determine the best use of computing resources given a particular problem.

Exercises Self Check

1.

Q-4: Given the following code fragment, what is its Big O running time? (Presume that variable n has been declared elsewhere in the program.)

int test = 0;
for (int i = 0; i < n; i++) {
   for (int j = 0; j < n; j++) {
      test = test + i * j;
   }
}

O(n)
In an example like this you want to count the nested loops. especially the loops that are dependent on the same variable, in this case, n.
O(n^2)
A singly nested loop like this is O(n^2)
O(log n)
log n typically is indicated when the problem is iteratvely made smaller
O(n^3)
In an example like this you want to count the nested loops. especially the loops that are dependent on the same variable, in this case, n.

2.

Q-5: Given the following code fragment what is its Big O running time? (Presume that n has been declared elsewhere in the program.)

int test = 0;
for (int i = 0; i < n; i++) {
   test = test + i;
}

for (int j = 0; j < n; j++) {
   test = test - 1;
}

O(n)
Even though there are two loops they are not nested. You might think of this as O(2n) but we can ignore the constant 2.
O(n^2)
Be careful, in counting loops you want to make sure the loops are nested.
O(log n)
log n typically is indicated when the problem is iteratvely made smaller
O(n^3)
Be careful, in counting loops you want to make sure the loops are nested.

3.

Q-6: Given the following code fragment what is its Big O running time? (Presume n has been declared elsewhere in the program.)

int i = n;
while (i > 0) {
   int k = 2 + 2;
   i = i / 2;
}

O(n)
Look carefully at the loop variable i. Notice that the value of i is cut in half each time through the loop. This is a big hint that the performance is better than O(n)
O(n^2)
Check again, is this a nested loop?
O(log n)
The value of i is cut in half each time through the loop so it will only take log n iterations.
O(n^3)
Check again, is this a nested loop?

You have attempted of activities on this page.

Prev Top Next