3.3. Algorithm Analysis Examples

3.3.1. An Anagram Detection Example

A good example problem for showing algorithms with different orders of magnitude is the classic anagram detection problem for strings sometimes called the anagram detection problem. One string is an anagram of another if the second is simply a rearrangement of the first. For example, "heart" and "earth" are anagrams. The strings "python" and "typhon" are anagrams as well. For the sake of simplicity, we will assume that the two strings in question are of equal length and that they are made up of symbols from the set of 26 lowercase alphabetic characters. Our goal is to write a Boolean function that will take two strings and return whether they are anagrams. Solution 1: Checking Off

Our first solution to the anagram problem will check the lengths of the strings and then to see that each character in the first string actually occurs in the second. If it is possible to “checkoff” each character, then the two strings must be anagrams. The first step in the process will be to convert the second string to a local second string for checking off. Each character from the first string can be checked against the characters in the local second string and if found, checked off by replacement. ActiveCode 1 shows this function.

To analyze this algorithm, we need to note that each of the n characters in s1 will cause an iteration through up to n characters in the array from s2. Each of the n positions in the array will be visited once to match a character from s1. The number of visits then becomes the sum of the integers from 1 to n. We stated earlier that this can be written as:

\[\begin{split}\sum_{i=1}^{n} i &= \frac {n(n+1)}{2} \\ &= \frac {1}{2}n^{2} + \frac {1}{2}n\end{split}\]

As \(n\) gets large, the \(n^{2}\) term will dominate the \(n\) term and the \(\frac {1}{2}\) can be ignored. Therefore, this solution is \(O(n^{2})\). Solution 2: Sort and Compare

Another solution to the anagram problem will make use of the fact that even though s1 and s2 are different, they are anagrams only if they consist of exactly the same characters. So, if we begin by sorting each string alphabetically, from a to z, we will end up with the same string if the original two strings are anagrams. ActiveCode 2 shows this solution.

At first glance you may be tempted to think that this algorithm is \(O(n)\), since there are three consecutive simple iterations: the first two to convert strings to char arrays and the last to compare the n characters after the sorting process. Sorting is typically either \(O(n^{2})\) or \(O(n\log n)\), so the sorting operations dominate the iteration. In the end, this algorithm will have the same order of magnitude as that of the sorting process. Solution 3: Brute Force

A brute force technique for solving a problem typically tries to exhaust all possibilities. For the anagram detection problem, we can simply generate an array of all possible strings using the characters from s1 and then see if s2 occurs. However, there is a difficulty with this approach. When generating all possible strings from s1, there are n possible first characters, \(n-1\) possible characters for the second position, \(n-2\) for the third, and so on. The total number of candidate strings is \(n*(n-1)*(n-2)*...*3*2*1\), which is \(n!\). Although some of the strings may be duplicates, the program cannot know this ahead of time and so it will still generate \(n!\) different strings.

It turns out that \(n!\) grows even faster than \(2^{n}\) as n gets large. In fact, if s1 were 20 characters long, there would be \(20!=2,432,902,008,176,640,000\) possible candidate strings. If we processed one possibility every second, it would take us 77,146,816,596 years to go through the entire array. This is probably not going to be a good solution. Solution 4: Count and Compare

Our final solution to the anagram problem takes advantage of the fact that any two anagrams will have the same number of a’s, the same number of b’s, the same number of c’s, and so on. In order to decide whether two strings are anagrams, we will first count the number of times each character occurs. Since there are 26 possible characters, we can use an array of 26 counters, one for each possible character. Each time we see a particular character, we will increment the counter at that position. In the end, if the two arrays of counters are identical, the strings must be anagrams. ActiveCode 3 shows this solution.

Again, the solution has a number of iterations. However, unlike the first solution, none of them are nested. The first two iterations used to count the characters are both based on n. The third iteration, comparing the two arrays of counts, always takes 26 steps since there are 26 possible characters in the strings. Adding it all up gives us \(T(n)=2n+26\) steps. That is \(O(n)\). We have found a linear order of magnitude algorithm for solving this problem.

Before leaving this example, we need to say something about space requirements. Although the last solution was able to run in linear time, it could only do so by using additional storage to keep the two arrays of character counts. In other words, this algorithm sacrificed space in order to gain time.

This is a common occurrence. On many occasions you will need to make decisions between time and space trade-offs. In this case, the amount of extra space is not significant. However, if the underlying alphabet had millions of characters, there would be more concern. As a computer scientist, when given a choice of algorithms, it will be up to you to determine the best use of computing resources given a particular problem.

3.3.2. The Traveling Salesperson Problem

Let’s consider a famous problem in computer science for a bit and let’s return to the brute force method. Imagine that a salesperson needs to travel to a set of places and find the shortest path to do so.

The Traveling Salesperson problem (TSP) has numerous direct applications in a number of fields, including transportation and logistics. The example of arranging school bus routes to pick up the children in a school district is of important historical significance since it provided motivation for Merrill Flood to do pioneering of TSP research in the 1940s. More current applications involve the scheduling of service calls or the delivery of packages or meals.

Although transportation applications are clearly a natural setting for TSP, there are applications in other areas such as the scheduling of a machine to drill holes in a circuit board. If the time it takes to move the head of the drill is a significant portion of the overall manufacturing process, then the TSP is important in reducing costs.

To be concrete, let’s imagine that a salesperson needs to travel to each country in the European Union and find the shortest path to do so. At the time of this writing, there were 28 European countries are members of the EU. Without the UK, there will be 27.

Applying the Brute Force solution to the Traveling Salesperson problem is a really terrible idea. an algorithm is said to scale well or be scalable if it is suitably efficient and practical when applied to an input with a large n, and brute force does not scale at all well because n = 28 is quite a small number.

As we know a brute force technique for solving a problem typically tries to exhaust all possibilities for our salesperson, that means trying every set of routes and checking the path distance. Like the anagram detection example, the total number of paths that a salesperson could try is \(n*(n-1)*(n-2)*...*3*2*1\), which is \(n!\) because they have to choose a first city from the \(n\) choices, then there are only \(n-1\) choices for the next city. Then \(n-2\) choices for the third etc. That is \(O(n!)\).

Using brute force to solve this problem requires we check the 27 factorial ways for the salesperson to consider traveling to each country. This means that there are 10,888,869,450,418,352,160,768,000,000 possible paths for the salesperson to check to travel when using the brute force solution.

Some of the fastest readily available processors currently are around 5GHz, where 1GHz represents 1 billion cycles per second. If you could do two computations in a cycle, then a computation would take 2/5,000,000,000 seconds which equals 0.0000000004 seconds. We can call this computation rate. Note that processors typically take more than one cycle to complete an instruction, but for the last decade or so most processors have been multicore… So, this is a rough estimate. Next you should take, computation rate times number of paths to give the amount of time in seconds, then divide that by the number of seconds in a year \(((24*60)*60)*365\). It would take 345,283,785,211,134,961,972.6 years for the Brute Force Solution to find the shortest path for our salesperson. If you have 6 or 12 cores, you can just multiply by three or six, but that is clearly not going to help us a lot.

What if we use a super computer? Summit is the fastest supercomputer in the world and can deliver as much as 200 petaflops at peak. This is equivalent to 200 quadrillion floating-point operations per second. Since a quadrillion is a million billion, Summit is 40 million times faster than the fastest regular processors. However, 8,632,094,630,278.3 years is clearly still far too many to wait.

And this was just to visit the 28 European Union Countries. How long would it take to find the shortest path to visit all of the 48 states in the continental United States? Using brute force on problems with numbers even as small as 28 is clearly unworkable. And, unfortunately for our salesperson, there is no known tractable solution for finding the best route for TSP, so solutions are used that are not best but are good enough. These are called heuristics.

The moral of the story here is that algorithms matter and algorithm analysis can help you decide not to choose a particular algorithm to use or not to use.

Self Check

You have attempted of activities on this page