# 7.5. Searching Algorithms¶

Computers store vast amounts of data. One of the strengths of computers is their ability to find things quickly. This ability is called searching. For the AP CS A exam you will need to know both linear (sequential) search and binary search algorithms.

The following video is also on YouTube at https://youtu.be/DHLCXXX1OtE. It introduces the concept of searching including sequential search and binary search.

Video: (DHLCXXX1OtE)

• Sequential or Linear search typically starts at the first element in an array or ArrayList and looks through all the items one by one until it either finds the desired value and then it returns the index it found the value at or if it searches the entire array or list without finding the value it returns -1.

• Binary search can only be used on data that has been sorted or stored in order. It checks the middle of the data to see if that middle value is less than, equal, or greater than the desired value and then based on the results of that it narrows the search. It cuts the search space in half each time.

If binary search requires the values in an array or list to be sorted, how can you do that? There are many sorting algorithms which are covered in the next lesson.

## 7.5.3. Runtimes¶

How do we choose between two algorithms that solve the same problem? They usually have different characteristics and runtimes which measures how fast they run. For the searching problem, it depends on your data.

Binary search is much faster than linear search, especially on large data sets, but it can only be used on sorted data. Often with runtimes, computer scientist think about the worst case behavior. With searching, the worst case is usually if you cannot find the item. With linear search, you would have to go through the whole array before realizing that it is not there, but binary search is much faster even in this case because it eliminates half the data set in each step. We can measure an informal runtime by just counting the number of steps.

Here is a table that compares the worst case runtime of each search algorithm given an array of n elements. The runtime here is measured as the number of times the loop runs in each algorithm or the number of elements we need to check in the worst case when we don’t find the item we are looking for. Notice that with linear search, the worst case runtime is the size of the array n, because it has to look through the whole array. For the binary search runtime, we can calculate the number of times you can divide n in half until you get to 1. So, for example 8 elements can be divided in half to narrow down to 4 elements, which can be further divided in half to narrow down to 2 elements, which can be further divided in half to get down to 1 element, and then if that is wrong, to 0 elements, so that is 4 divisions or guesses to get the answer (8->4->2->1->0). In the table below, every time we double the size of N, we need at most one more guess or comparison with binary search. It’s much faster than linear search!

N

Linear Search

Binary Search

2

2 comparisons

2 comparisons

4

4

3

8

8

4

16

16

5

100

100

7

Runtimes can be described with mathematical functions. For an array of size n, linear search runtime is a linear function, and binary search runtime is a function of log base 2 of n (or log n + 1 comparisons). This is called the big-O runtime function in computer science, for example O(log n) vs. O(n). You can compare the growth of functions like n and log2n as n, the data size, grows and see that binary search runs much faster for any n. You don’t need to know the log n runtime growth function for the AP exam, but you should be able to calculate how many steps binary search takes for a given n by counting how many times you can divide it in half. Or you can start at 1 and keep a count of how many times you can double it with the powers of two (1, 2, 4, 8, 16, 32, 64, 128, 512, 1024, etc.) until you reach a number that is slightly above n.

7-5-3: Which will cause the shortest execution of a binary search looking for a value in an array of integers?

• The value is the first one in the array
• This would be true for sequential search, not binary.
• The value is in the middle of the array
• If the value is in the middle of the array the binary search will return after one iteration of the loop.
• The value is the last one in the array
• How would that be the shortest in a binary search?
• The value isn't in the array
• This is true for the longest execution time, but we are looking for the shortest.

7-5-4: Which of the following conditions must be true in order to search for a value using binary search?

I. The values in the array must be integers.
II. The values in the array must be in sorted order.
III. The array must not contain duplicate values.

• I only
• You can use a binary search on any type of data that can be compared, but the data must be in order.
• I and II
• You can use a binary search on any type of data that can be compared.
• II only
• The only requirement for using a Binary Search is that the values must be ordered.
• II and III
• The array can contain duplicate values.

7-5-5: How many times would the loop in the binary search run for an array int[] arr = {2, 10, 23, 31, 55, 86} with binarySearch(arr,55)?

• 2
• It will first compare with the value at index 2 and then index 4 and then return 4.
• 1
• This would be true if we were looking for 23.
• 3
• This would be true if we were looking for 31.

7-5-6: If you had an ordered array of size 500, what is the maximum number of iterations required to find an element with binary search?

• approximately 15 times
• How many times can you divide 500 in half?
• approximately 9 times
• You can divide 500 in half, 9 times, or you can observe that 2^9 = 512 which is slightly bigger than 500.
• 500 times
• How many times can you divide 500 in half?
• 2 times
• How many times can you divide 500 in half?

## 7.5.4. Programming Challenge : Search Runtimes¶

Let’s go back to the spell checker that we programmed in Unit 6. Remember that it used linear search to find a word in the dictionary. The dictionary file was actually in alphabetical order though, so we could have used a much faster binary search.

Here is a version of the spellchecker on repl.it that uses an ArrayList for the dictionary and a linear search method. Notice that get(i) is used instead of [] to get an element in the ArrayList dictionary at index i. The search also prints out the index where it found the word. This is an informal runtime that tells us how many words it had to check. Run the code in the window below or on repl.it with the following test cases and record the runtime for each word in this Google document (do File/Make a Copy) also seen below to record your answers.

Now, login to repl and start changing the code to save the repl.it as your own project. The SpellChecker.java file also has a binarySpellCheck(word) method defined, but it does not print out the number of words checked. Looking at the linearSpellCheck(word) method as a guide, add in a counter variable, and increment it in the binary search loop after finding the middle of the list, and print it out before returning true or false. Change the Main.java code to call the binarySpellCheck method instead of the linearSpellCheck method, and try all the same test case words again. Record the runtimes for binary search and compare with the linear search times. What do you notice? Which one was faster in general? Were there some cases where each was faster? How fast were they with misspelled words? Record your answers in the window below.

## 7.5.5. Summary¶

• There are standard algorithms for searching.

• Sequential/linear search algorithms check each element in order until the desired value is found or all elements in the array or ArrayList have been checked.

• The binary search algorithm starts at the middle of a sorted array or ArrayList and eliminates half of the array or ArrayList in each iteration until the desired value is found or all elements have been eliminated.

• Data must be in sorted order to use the binary search algorithm. This algorithm will be covered more in Unit 10.

• Informal run-time comparisons of program code segments can be made using statement execution counts.