Python增强的生成器：协程-python 生成器

[[202482]]

本文主要介绍python中Enhanced generator即coroutine相关内容，包括基本语法、使用场景、注意事项，以及与其他语言协程实现的异同。

enhanced generator

在上文《Python Yield Generator 详解》中介绍了yield和generator的使用场景和主意事项，只用到了generator的next方法，事实上generator还有更强大的功能。PEP 342为generator增加了一系列方法来使得generator更像一个协程Coroutine。做主要的变化在于早期的yield只能返回值(作为数据的产生者)，而新增加的send方法能在generator恢复的时候消费一个数值，而去caller(generator的调用着)也可以通过throw在generator挂起的主动抛出异常。

back_data = yield cur_ret 
1.

这段代码的意思是：当执行到这条语句时，返回cur_ret给调用者;并且当generator通过next()或者send(some_data)方法恢复的时候，将some_data赋值给back_data.例如：

def gen(data): 
 
    print 'before yield', data 
 
    back_data = yield data 
 
    print 'after resume', back_data 
 
     
 
if __name__ == '__main__': 
 
    g = gen(1) 
 
    print g.next() 
 
    try: 
 
        g.send(0) 
 
    except StopIteration: 
 
        pass  
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.

输出：

before yield 1 
 
1 
 
after resume 0  
1.
2.
3.
4.
5.

两点需要注意：

next() 等价于 send(None)
***次调用时，需要使用next()语句或是send(None)，不能使用send发送一个非None的值，否则会出错的，因为没有Python yield语句来接收这个值。

应用场景

当generator可以接受数据(在从挂起状态恢复的时候) 而不仅仅是返回数据时， generator就有了消费数据(push)的能力。下面的例子来自这里:

word_map = {} 
 
def consume_data_from_file(file_name, consumer): 
 
    for line in file(file_name): 
 
        consumer.send(line)      
 
def consume_words(consumer): 
 
    while True: 
 
        line = yield 
 
        for word in (w for w in line.split() if w.strip()): 
 
            consumer.send(word)      
 
def count_words_consumer(): 
 
    while True: 
 
        word  = yield 
 
        if word not in word_map: 
 
            word_map[word] = 0 
 
        word_map[word] += 1 
 
    print word_map      
 
if __name__ == '__main__': 
 
    cons = count_words_consumer() 
 
    cons.next() 
 
    cons_inner = consume_words(cons) 
 
    cons_inner.next() 
 
    c = consume_data_from_file('test.txt', cons_inner) 
 
    print word_map  
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.

上面的代码中，真正的数据消费者是count_words_consumer，最原始的数据生产者是consume_data_from_file，数据的流向是主动从生产者推向消费者。不过上面第22、24行分别调用了两次next，这个可以使用一个decorator封装一下。

def consumer(func): 
 
    def wrapper(*args,**kw): 
 
        gen = func(*args, **kw) 
 
        gen.next() 
 
        return gen 
 
    wrapper.__name__ = func.__name__ 
 
    wrapper.__dict__ = func.__dict__ 
 
    wrapper.__doc__  = func.__doc__ 
 
    return wrapper  
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.

修改后的代码：

def consumer(func): 
 
    def wrapper(*args,**kw): 
 
        gen = func(*args, **kw) 
 
        gen.next() 
 
        return gen 
 
    wrapper.__name__ = func.__name__ 
 
    wrapper.__dict__ = func.__dict__ 
 
    wrapper.__doc__  = func.__doc__ 
 
    return wrapper      
 
word_map = {} 
 
def consume_data_from_file(file_name, consumer): 
 
    for line in file(file_name): 
 
        consumer.send(line)      
 
@consumer 
 
def consume_words(consumer): 
 
    while True: 
 
        line = yield 
 
        for word in (w for w in line.split() if w.strip()): 
 
            consumer.send(word)      
 
@consumer 
 
def count_words_consumer(): 
 
    while True: 
 
        word  = yield 
 
        if word not in word_map: 
 
            word_map[word] = 0 
 
        word_map[word] += 1 
 
    print word_map      
 
if __name__ == '__main__': 
 
    cons = count_words_consumer() 
 
    cons_inner = consume_words(cons) 
 
    c = consume_data_from_file('test.txt', cons_inner) 
 
    print word_map      
 
example_with_deco  
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.

generator throw

除了next和send方法，generator还提供了两个实用的方法，throw和close，这两个方法加强了caller对generator的控制。send方法可以传递一个值给generator，throw方法在generator挂起的地方抛出异常，close方法让generator正常结束(之后就不能再调用next send了)。下面详细介绍一下throw方法。

throw(type[, value[, traceback]]) 
1.

在generator yield的地方抛出type类型的异常，并且返回下一个被yield的值。如果type类型的异常没有被捕获，那么会被传给caller。另外，如果generator不能yield新的值，那么向caller抛出StopIteration异常：

@consumer 
 
def gen_throw(): 
 
    value = yield 
 
    try: 
        yield value 
 
    except Exception, e: 
 
        yield str(e) # 如果注释掉这行，那么会抛出StopIteration      
 
if __name__ == '__main__': 
 
    g = gen_throw() 
 
    assert g.send(5) == 5 
 
    assert g.throw(Exception, 'throw Exception') == 'throw Exception'  
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.

***次调用send，代码返回value(5)之后在第5行挂起，然后generator throw之后会被第6行catch住。如果第7行没有重新yield，那么会重新抛出StopIteration异常。

注意事项

如果一个生成器已经通过send开始执行，那么在其再次yield之前，是不能从其他生成器再次调度到该生成器

@consumer 
 
def funcA(): 
 
    while True: 
 
        data = yield 
 
        print 'funcA recevie', data 
 
        fb.send(data * 2)      
 
@consumer 
 
def funcB(): 
 
    while True: 
 
        data = yield 
 
        print 'funcB recevie', data 
 
        fa.send(data * 2)      
 
fa = funcA() 
 
fb = funcB() 
 
if __name__ == '__main__': 
 
    fa.send(10)  
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.

输出：

funcA recevie 10 
 
funcB recevie 20 
 
ValueError: generator already executing  
1.
2.
3.
4.
5.

Generator 与 Coroutine

回到Coroutine，可参见维基百科解释，而我自己的理解比较简单(或者片面)：程序员可控制的并发流程，不管是进程还是线程，其切换都是操作系统在调度，而对于协程，程序员可以控制什么时候切换出去，什么时候切换回来。协程比进程线程轻量级很多，较少了上下文切换的开销。另外，由于是程序员控制调度，一定程度上也能避免一个任务被中途中断.。协程可以用在哪些场景呢，我觉得可以归纳为非阻塞等待的场景，如游戏编程，异步IO，事件驱动。

Python中，generator的send和throw方法使得generator很像一个协程(coroutine), 但是generator只是一个半协程(semicoroutines)，python doc是这样描述的：

“All of this makes generator functions quite similar to coroutines; they yield multiple times, they have more than one entry point and their execution can be suspended. The only difference is that a generator function cannot control where should the execution continue after it yields; the control is always transferred to the generator’s caller.”

尽管如此，利用enhanced generator也能实现更强大的功能。比如上文中提到的yield_dec的例子，只能被动的等待时间到达之后继续执行。在某些情况下比如触发了某个事件，我们希望立即恢复执行流程，而且我们也关心具体是什么事件，这个时候就需要在generator send了。另外一种情形，我们需要终止这个执行流程，那么刻意调用close，同时在代码里面做一些处理，伪代码如下：

@yield_dec 
 
def do(a): 
 
    print 'do', a 
 
    try： 
 
        event ＝ yield 5 
 
        print 'post_do', a， event 
 
    finally： 
 
        print 'do sth'  
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.

至于之前提到的另一个例子，服务(进程)之间的异步调用，也是非常适合实用协程的例子。callback的方式会割裂代码，把一段逻辑分散到多个函数，协程的方式会好很多，至少对于代码阅读而言。其他语言，比如C#、Go语言，协程都是标准实现，特别对于go语言，协程是高并发的基石。在python3.x中，通过asyncio和async\await也增加了对协程的支持。在笔者所使用的2.7环境下，也可以使用greenlet，之后会有博文介绍。

参考

https://www.python.org/dev/peps/pep-0342/
http://www.dabeaz.com/coroutines/
https://en.wikipedia.org/wiki/Coroutine#Implementations_for_Python