3.1 Search Algorithms

Linear Search

Linear search is an easily-understood algorithm to check if a value is in an array. The logic behind it is quite simple:

Start at the beginning of the array. If the value there is equal to the value that you’re trying to find, stop. If not, go to the next value and repeat until you’ve reached the end of the array.

For example, if we were trying to see if 3 was in the array [1, 6, 7, 2, 3, 56]:

index stored value equal to 3?
0 1 No
1 6 No
2 7 No
3 1 No
4 3 Yes

Pretty intuitive, right? Linear Search is very intuitive, and thus very easy to code, as we see below.

def linear_search(arr, val) -> int:
    """
    Search the provided array for the provided value
    and get the index, if found
    Arguments:
        arr - the array to search in
        val - the value to search for
    Returns:
        int - the index of the value if it was found, else -1
    """
    for i in range(len(arr)):
        if arr[i] == val:
            return i
    return -1

View code on GitHub.

Linear Search - Pros and Cons

Cons - Linear Search has O(n) time complexity. This is because, in cases where the item is not found, it will search all the items in the list.

Pros - Linear Search works on all lists and all datatypes. It doesn’t matter if the list is sorted or not since Linear Search will just check if it is equal and, if not, keep checking every other item

Binary Search

Let’s say you have an array containing many values, and you want to check whether a value is present in the array. If the value is present in the array, then return the index of the value, else return a value of -1. Binary Search is a fast way to quickly find objects in ordered arrays or confirm that the value isn’t present in the array. We’ll get to why the array has to be ordered later. A simple way to learn how Binary Search works is by playing a simple game.

I’m thinking of a random number between 1 and 100. Your job is to guess the number I’m thinking of in the least possible number of guesses, and every time you guess a number, I’ll tell you whether your guess is larger or smaller than the number I’m thinking of. So, what’s your first guess?

Many people would guess 50 presented the problem above. And it makes the most sense, because after I say whether your guess is too high or too low, you’ve trimmed the number of possible numbers I’m thinking of in half. Let’s continue playing our game: my number is larger than 50.

The next logical guess most people would pick is 75. Since we know from our last guess that the number I’m thinking of must be between 50 and 100, we want to cut our possibilities in half again. The number directly in the middle of 50 and 100 is 75, so once I say whether my guess is higher or lower, you’ve once again cut the number of possible numbers in half. Continuing our game: my number is smaller than 75, good guess.

This process would repeat again and again until you get the correct answer. Since we know that the number is between 50 and 75, we have an upper and lower bound (both are exclusive). Once again our strategy is to pick the number exactly in between the upper and lower bound. In this case, 62.5 is the number in the middle, but since my number is a whole number, we can round up our guess to 63. The game would continue as follows:

Guess # Boundaries Guess Result
3 50 - 75 63 my number is larger than 63
4 63 - 75 69 my number is larger than 69
5 69 - 75 72 my number is larger than 72
6 72 - 75 73 my number is larger than 73
7 73 - 75 74 you got it!

In this way, we were able to guess the number in 7 guesses. This was actually the worst case scenario, because every time we made a guess between our upper and lower bound, we had a small chance of that being the correct answer. In our case, we had to narrow it down to exactly one possibility before we got the answer. Even so, 7 guesses is quite small considering we had 100 possible answers in the beginning of the game. Let’s see how this principle of trimming our possibilities in half after every guess can be applied to checking for a value in an array as fast as possible.