Improve your Python: explain yield and generators

Improve your Python: explain yield and generators

Source of the original text: Jeff Knupp    Source of the translation: oschina   

Before starting the course, I asked the students to fill out a questionnaire that reflects their understanding of some concepts in Python. Some topics ("if/else control flow" or "defining and using functions") are not a problem for most students. But there are some topics that most students have little or no contact at all, especially "generators and yield keywords". I guess this is also true for most novice Python programmers.

Facts have shown that after I have spent a lot of effort, some people still cannot understand the generator and the yield keyword. I want to improve this problem. In this article, I will explain what the yield keyword is, why it is useful, and how to use it.

Note: In recent years, the function of the generator has become more and more powerful, it has been added to PEP. In my next article, I will use coroutine, cooperative multitasking, and asynchronous I/O (especially the implementation of the "tulip" prototype that GvR is studying) ) To introduce the true power of yield. But before that, we need to have a solid understanding of generators and yield.

Coroutines and subroutines

When we call an ordinary Python function, we usually start execution from the first line of code of the function, and end at the return statement, exception or the end of the function (which can be seen as an implicit return of None). Once the function returns control to the caller, it means all is over. All the work done in the function and the data saved in local variables will be lost. When this function is called again, everything will be created from scratch.

For the functions discussed in computer programming, this is a very standard process. Such a function can only return one value, but sometimes it is helpful to be able to create a function that produces a sequence. To do this, such functions need to be able to "save their work."

As I said, it is possible to "produce a sequence" because our function does not return in the usual sense. Return implies that the function is returning control of the executed code to the place where the function is called. The implicit meaning of "yield" is that the transfer of control is temporary and voluntary, and our function will take back control in the future.

In Python, a "function" with this ability is called a generator, and it is very useful. Generators (and yield statements) were originally introduced to make it easier for programmers to write code that generates sequences of values. In the past, to implement something similar to a random number generator, it was necessary to implement a class or a module to keep track of the state between each call while generating data. After the introduction of the generator, this became very simple.

In order to better understand the problem that the generator solves, let's look at an example. In the process of understanding this example, please always remember the problem we need to solve: the sequence of generated values.

Note: Outside of Python, the simplest generators should be what are called coroutines . In this article, I will use this term. Remember, in the concept of Python, the coroutine mentioned here is a generator. The official term for Python is generator; coroutines are just for discussion, and there is no formal definition at the language level.

Example: Interesting prime numbers

Suppose your boss asks you to write a function, the input parameter is a list of int, return a result that contains prime 1 that can be iterated .

Remember, iterator (Iterable) is just an object's ability to return a specific member each time.

You must think "this is very simple", and then quickly write the following code:

def get_primes(input_list):
    result_list = list()
    for element in input_list:
        if is_prime(element):
            result_list.append()

    return result_list

#  ...

def get_primes(input_list):
    return (element for element in input_list if is_prime(element))

#   is_prime  ...

def is_prime(number):
    if number > 1:
        if number == 2:
            return True
        if number % 2 == 0:
            return False
        for current in range(3, int(math.sqrt(number) + 1), 2):
            if number % current == 0: 
                return False
        return True
    return False 

The implementation of is_prime above fully meets the needs, so we told the boss that it was done. She reported that our function was working properly, which was exactly what she wanted.

Handling infinite sequences

Oh, is that true? A few days later, the boss came and told us that she had encountered some small problems: she planned to use our get_primes function for a large list containing numbers. In fact, this list is very large, just creating this list will use up all the memory of the system. To this end, she hopes to take a start parameter when calling the get_primes function, and return all prime numbers greater than this parameter (maybe she wants to solve Project Euler problem 10 ).

Let's take a look at this new requirement. It is obvious that simply modifying get_primes is impossible. Naturally, it is impossible for us to return a list containing all prime numbers from start to infinity (although there are many useful applications that can be used to manipulate infinite sequences) . It seems that the possibility of dealing with this problem with ordinary functions is relatively slim.

Before we give up, let us determine the core obstacle, what prevents us from writing a function that meets the new needs of the boss. After thinking about it, we came to the conclusion that the function has only one chance to return the result, so it must return all the results at once. It seems pointless to come to such a conclusion; "Doesn't functions work like this", we usually think so. However, if you don't learn it, you don't know if you don't ask, "What if they are not the case?"

Imagine if get_primes could simply return the next value instead of returning all the values at once, what can we do? We no longer need to create a list. Without the list, there is no memory problem. Since the boss told us that she only needs to traverse the results, she will not know the difference in our implementation.

Unfortunately, this seems unlikely. Even if we have a magic function that allows us to traverse from n to infinity, we will get stuck after returning the first value:

def get_primes(start):
    for element in magical_infinite_range(start):
        if is_prime(element):
            return element 

Suppose we call get_primes like this:

def solve_number_10():
    # She *is* working on Project Euler #10, I knew it!
    total = 2
    for next_prime in get_primes(3):
        if next_prime < 2000000:
            total += next_prime
        else:
            print(total)
            return 

Obviously, in get_primes, the input equal to 3 will be encountered as soon as it comes up, and it will be returned in the fourth line of the function. Unlike the direct return, what we need is to prepare a value for the next request when we exit.

But functions cannot do this. When the function returns, it means all is complete. We guarantee that the function can be called again, but we can't guarantee that, "Uh, this time it starts from the 4th line when it exited last time, instead of starting from the first line as usual." The function has only one single entry: the first line of code of the function.

Walk into the generator

This type of problem is so common that Python specifically adds a structure to solve it: generators. A generator "generates" the value. Creating a generator is almost as simple as the principle of a generator function.

The definition of a generator function is very much like a normal function, except when it wants to generate a value, the yield keyword is used instead of return. If the body of a def contains yield, this function will automatically become a generator (even if it contains a return). Apart from the above, there are no extra steps to create a generator.

The generator function returns an iterator of the generator. This may be the last time you see the term "generator iterators", because they are usually called "generators". It should be noted that generators are a special type of iterator. As an iterator, the generator must define some methods, one of which is __next__(). Like an iterator, we can use the next() function to get the next value.

To get the next value from the generator, we use the next() function, just like with iterators.

(next() will worry about how to call the generator's __next__() method). Since the generator is an iterator, it can be used in a for loop.

Whenever the generator is called, it returns a value to the caller. Use yield inside the generator to complete this action (for example, yield 7). In order to remember what the yield does, the easiest way is to use it as a special return (with a little magic) dedicated to the generator function. **

Yield is the return (with a little magic) specially used for generators.

Here is a simple generator function:

Python
>>> def simple_generator_function():
>>>    yield 1
>>>    yield 2
>>>    yield 3 

There are two easy ways to use it:

Python
>>> for value in simple_generator_function():
>>>     print(value)
1
2
3
>>> our_generator = simple_generator_function()
>>> next(our_generator)
1
>>> next(our_generator)
2
>>> next(our_generator)
3 

magic?

So where is the magic part? I'm glad you asked this question! When a generator function calls yield, the "state" of the generator function will be frozen, and the values of all variables will be preserved, and the next line will be executed. The location of the code will also be recorded until next() is called again. Once next() is called again, the generator function will start from where it left off. If next() is never called, the state saved by yield is ignored.

Let's rewrite the get_primes() function, this time we will write it as a generator. Note that we no longer need the magical_infinite_range function. Using a simple while loop, we created our own infinite list.

Python
def get_primes(number):
    while True:
        if is_prime(number):
            yield number
        number += 1 

If the generator function calls return, or executes to the end of the function, a StopIteration exception will occur. This informs the caller of next() that this generator has no next value (this is the behavior of normal iterators). This is why this while loop appears in our get_primes() function. Without this while, when we call next() the second time, the generator function will execute to the end of the function, triggering the StopIteration exception. Once the generator value is used up, an error will occur when you call next(), so you can only use each generator once. The following code is wrong:

Python
>>> our_generator = simple_generator_function()
>>> for value in our_generator:
>>>     print(value)

>>> #  ...
>>> print(next(our_generator))
Traceback (most recent call last):
  File "<ipython-input-13-7e48a609051a>", line 1, in <module>
    next(our_generator)
StopIteration

>>> #  
>>> #  

>>> new_generator = simple_generator_function()
>>> print(next(new_generator)) #  
1 

Therefore, this while loop is used to ensure that the generator function will never execute to the end of the function. Just call next() this generator will generate a value. This is a common method of dealing with infinite sequences (this type of generator is also very common).

Implementation process

Let's go back to where get_primes is called: solve_number_10.

Python
def solve_number_10():
    # She *is* working on Project Euler #10, I knew it!
    total = 2
    for next_prime in get_primes(3):
        if next_prime < 2000000:
            total += next_prime
        else:
            print(total)
            return 

Let's take a look at the call to get_primes in the for loop of solve_number_10 and observe how the first few elements are created to help our understanding. When the for loop requests the first value from get_primes, we enter get_primes, which is no different from entering a normal function.

  1. Enter the while loop of the third line
  2. Stop at the if condition judgment (3 is a prime number)
  3. Return 3 and execution control to solve_number_10 through yield

Next, back to insolve_number_10:

  1. The for loop gets the return value 3
  2. The for loop assigns it to next_prime
  3. total plus next_prime
  4. The for loop requests the next value from get_primes

This time, when we entered get_primes, we did not execute it from the beginning. We continued execution from line 5, which is where we left last time.

def get_primes(number):
    while True:
        if is_prime(number):
            yield number
        number += 1 # <<<<<<<<<< 

The most important thing is that number keeps the value (for example, 3) when we called yield last time . Remember, yield will pass the value to the caller of next() and also save the "state" of the generator function. Next, number is added to 4, back to the beginning of the while loop, and then continues to increase until the next prime number (5) is obtained. We once again return the value of number to the for loop of solve_number_10 through yield. This cycle will continue to execute until the end of the for loop (the prime number obtained is greater than 2,000,000).

More awesome

Support for passing values to generators was added in PEP 342 . PEP 342 adds new features that allow generators to be implemented in a single statement, generate a value (as before), accept a value, or generate a value and accept a value at the same time.

We use the previous function on prime numbers to show how to pass a value to the generator. This time, we no longer simply generate a prime number larger than a certain number, but find the smallest prime number larger than the geometric series of a certain number (for example, 10, we want to generate more than 10, 100, 1000, 10000... Large smallest prime number). We start with get_primes:

Python
def print_successive_primes(iterations, base=10):
    #  
   
    prime_generator = get_primes(base)
    #  
    for power in range(iterations):
        #  

def get_primes(number):
    while True:
        if is_prime(number):
        #  ? 

The last few lines of get_primes need to be emphatically explained. The yield keyword returns the value of number, and a statement like other = yield foo means, "return the value of foo, and when this value is returned to the caller, set the value of other to that value." You can use the send method to "send" a value to the generator.

Python
def get_primes(number):
    while True:
        if is_prime(number):
            number = yield number
        number += 1 

In this way, we can set a different value for number each time the yield is executed. Now we can fill in the missing part of the code in print_successive_primes:

Python
def print_successive_primes(iterations, base=10):
    prime_generator = get_primes(base)
    prime_generator.send(None)
    for power in range(iterations):
        print(prime_generator.send(base ** power)) 

There are two points to note here: 1. we print the result of generator.send, which is no problem, because send sends data to the generator and also returns the value generated by the generator through yield (just like in the generator The yield statement does it).

The second point, look at the line prime_generator.send(None). When you use send to "start" a generator (that is, from the first line of code execution of the generator function to the position of the first yield statement), you Must send None. This is not difficult to understand. According to the description just now, the generator has not reached the first yield statement. If we generate a real value, no one will "receive" it at this time. Once the generator is started, we can send data as above.

Summary

In the second half of this series of articles, we will discuss some advanced usage and effects of yield. Yield has become one of Python's most powerful keywords. Now that we have a full understanding of how yield works, we have the necessary knowledge to understand some of the more "puzzling" application scenarios of yield.

Believe it or not, we are actually just revealing a corner of the power of yield. For example, send does work as mentioned before, but in a scenario where a simple sequence is generated like our example, send is almost never used. Below I post a piece of code to show that send is usually use of . I am not going to say more about how this code works and why it works like this, it will serve as a good warm-up for the second part.

Python
import random

def get_data():
    """ 0 9 3 """
    return random.sample(range(10), 3)

def consume():
    """ """
    running_sum = 0
    data_items_seen = 0

    while True:
        data = yield
        data_items_seen += len(data)
        running_sum += sum(data)
        print('The running average is {}'.format(running_sum/float(data_items_seen)))

def produce(consumer):
    """ consumer """
    while True:
        data = get_data()
        print('Produced {}'.format(data))
        consumer.send(data)
        yield

if __name__ == '__main__':
    consumer = consume()
    consumer.send(None)
    producer = produce(consumer)

    for _ in range(10):
        print('Producing...')
        next(producer) 

Please remember...

I hope you can get some key ideas from the discussion in this article:

  • generator is used to generate a series of values
  • yield is like the return result of a generator function
  • The only other thing yield does is to save the state of a generator function
  • A generator is a special type of iterator (iterator)
  • Similar to iterators, we can get the next value from the generator by using next()
  • Ignore some values by implicitly calling next()

I hope this article is useful. If you have never heard of a generator, I hope now you can understand what it is and why it is useful, and understand how to use it. If you are already familiar with generators to some extent, I hope this article can now clear up some confusion about generators.

As always, if the content of a section is not very clear (or the content of a section is more important, or some content contains errors), please do everything possible to let me know. You can leave your comment below, send an email to jeff@jeffknupp.com or @jeffknupp on Twitter.