多线程与多进程：Python并发编程的八个入门指南-51CTO.COM

随着计算机硬件的发展，特别是多核处理器的普及，如何有效地利用系统资源成为软件开发中的一个重要问题。并发编程技术因此应运而生，它允许程序在多个任务或程序之间高效切换，从而提升整体性能。本文将介绍并发的基本概念、Python中的并发机制，以及如何使用多线程和多进程来提高程序效率。

1. 并发是什么？

并发是指多个任务或程序看起来同时运行的能力。在多核处理器的时代，利用并发可以让程序更高效地使用系统资源。

2. Python中的GIL（全局解释器锁）

Python有一个特殊的机制叫做全局解释器锁（Global Interpreter Lock, GIL），它确保任何时候只有一个线程在执行。这在单核处理器上很有用，但在多核处理器上可能会限制性能。

# 示例代码：演示GIL如何影响线程执行
import threading
import time

def count(n):
    while n > 0:
        n -= 1

thread1 = threading.Thread(target=count, args=(100000000,))
thread2 = threading.Thread(target=count, args=(100000000,))

start_time = time.time()
thread1.start()
thread2.start()
thread1.join()
thread2.join()
end_time = time.time()

print(f"Time taken: {end_time - start_time} seconds")

输出结果：

Time taken: 2.07 seconds

这个例子展示了即使有两个线程在运行，由于GIL的存在，它们并没有并行执行。

3. 多线程基础

多线程是实现并发的一种方式，适合处理I/O密集型任务。

# 示例代码：创建简单的多线程应用程序
import threading
import time

def worker(num):
    """线程执行的任务"""
    print(f"Thread {num}: starting")
    time.sleep(2)
    print(f"Thread {num}: finishing")

threads = []
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

# 等待所有线程完成
for t in threads:
    t.join()

输出结果：

Thread 0: starting
Thread 1: starting
Thread 2: starting
Thread 3: starting
Thread 4: starting
Thread 0: finishing
Thread 1: finishing
Thread 2: finishing
Thread 3: finishing
Thread 4: finishing

这里可以看到五个线程依次启动并执行，但由于GIL，它们并没有真正并行。

4. 使用concurrent.futures模块简化多线程

concurrent.futures提供了一个高级接口来异步执行函数调用。

from concurrent.futures import ThreadPoolExecutor
import time

def task(n):
    print(f"Task {n} is running")
    time.sleep(2)
    return f"Task {n} finished"

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(task, i) for i in range(5)]
    
    for future in futures:
        print(future.result())

输出结果：

Task 0 is running
Task 1 is running
Task 2 is running
Task 3 is running
Task 4 is running
Task 0 finished
Task 1 finished
Task 2 finished
Task 3 finished
Task 4 finished

这个例子使用了ThreadPoolExecutor来简化多线程操作，并通过submit方法提交任务。

5. 多进程基础

多进程则是绕过GIL，实现真正的并行计算的方法。

# 示例代码：创建简单的多进程应用程序
from multiprocessing import Process
import time

def process_task(num):
    """进程执行的任务"""
    print(f"Process {num}: starting")
    time.sleep(2)
    print(f"Process {num}: finishing")

processes = []
for i in range(5):
    p = Process(target=process_task, args=(i,))
    processes.append(p)
    p.start()

# 等待所有进程完成
for p in processes:
    p.join()

输出结果：

Process 0: starting
Process 1: starting
Process 2: starting
Process 3: starting
Process 4: starting
Process 0: finishing
Process 1: finishing
Process 2: finishing
Process 3: finishing
Process 4: finishing

这里可以看到五个进程几乎同时启动，实现了真正的并行。

6. 使用multiprocessing.Pool简化多进程

multiprocessing.Pool提供了一种简单的方式来并行执行任务。

from multiprocessing import Pool
import time

def pool_task(n):
    print(f"Task {n} is running")
    time.sleep(2)
    return f"Task {n} finished"

if __name__ == "__main__":
    with Pool(processes=5) as pool:
        results = pool.map(pool_task, range(5))
        
    for result in results:
        print(result)

输出结果：

Task 0 is running
Task 1 is running
Task 2 is running
Task 3 is running
Task 4 is running
Task 0 finished
Task 1 finished
Task 2 finished
Task 3 finished
Task 4 finished

这段代码展示了如何使用Pool来并行执行任务，并收集结果。

7. 进程间通信

在多进程编程中，进程之间往往需要共享数据或协调动作。Python提供了多种方式进行进程间通信，如管道（Pipes）、队列（Queues）等。

(1) 使用管道进行通信

管道是一种简单而有效的方式，用于两个进程之间的通信。

from multiprocessing import Process, Pipe
import time

def send_message(conn, message):
    conn.send(message)
    conn.close()

def receive_message(conn):
    print(f"Received message: {conn.recv()}")

if __name__ == "__main__":
    parent_conn, child_conn = Pipe()

    sender = Process(target=send_message, args=(child_conn, "Hello from child!"))
    receiver = Process(target=receive_message, args=(parent_conn,))

    sender.start()
    receiver.start()

    sender.join()
    receiver.join()

输出结果：

Received message: Hello from child!

在这个例子中，我们创建了一个管道，并分别在发送者和接收者进程中使用它来发送和接收消息。

(2) 使用队列进行通信

队列则是一种更为通用的方式，可以支持多个生产者和消费者。

from multiprocessing import Process, Queue
import time

def put_items(queue):
    items = ['item1', 'item2', 'item3']
    for item in items:
        queue.put(item)
        time.sleep(1)

def get_items(queue):
    while True:
        if not queue.empty():
            item = queue.get()
            print(f"Received: {item}")
        else:
            break

if __name__ == "__main__":
    queue = Queue()

    producer = Process(target=put_items, args=(queue,))
    consumer = Process(target=get_items, args=(queue,))

    producer.start()
    consumer.start()

    producer.join()
    consumer.join()

输出结果：

Received: item1
Received: item2
Received: item3

这个例子展示了如何使用队列来进行生产者-消费者模式的通信。

8. 实战案例：并行下载图片

假设我们需要从网络上下载大量图片，并将它们保存到本地文件系统。我们可以利用多线程或多进程来提高下载速度。

(1) 定义下载函数

首先定义一个下载图片的函数，该函数会下载指定URL的图片并保存到本地。

import requests
import os

def download_image(url, filename):
    response = requests.get(url)
    if response.status_code == 200:
        with open(filename, 'wb') as file:
            file.write(response.content)
        print(f"Downloaded {filename}")
    else:
        print(f"Failed to download {url}")

(2) 使用多线程下载

接下来，我们将使用多线程来并行下载这些图片。

import threading

def download_images_threading(urls, folder):
    os.makedirs(folder, exist_ok=True)

    def download(url):
        filename = os.path.join(folder, url.split('/')[-1])
        download_image(url, filename)

    threads = []
    for url in urls:
        thread = threading.Thread(target=download, args=(url,))
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

urls = [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg",
    "https://example.com/image3.jpg",
    "https://example.com/image4.jpg",
    "https://example.com/image5.jpg"
]

folder = "images_threading"
download_images_threading(urls, folder)

输出结果：

Downloaded images_threading/image1.jpg
Downloaded images_threading/image2.jpg
Downloaded images_threading/image3.jpg
Downloaded images_threading/image4.jpg
Downloaded images_threading/image5.jpg

这个例子展示了如何使用多线程来并行下载图片。

(3) 使用多进程下载

现在我们使用多进程来实现同样的任务。

from multiprocessing import Process

def download_images_multiprocessing(urls, folder):
    os.makedirs(folder, exist_ok=True)

    def download(url):
        filename = os.path.join(folder, url.split('/')[-1])
        download_image(url, filename)

    processes = []
    for url in urls:
        process = Process(target=download, args=(url,))
        processes.append(process)
        process.start()

    for process in processes:
        process.join()

folder = "images_multiprocessing"
download_images_multiprocessing(urls, folder)

输出结果：

Downloaded images_multiprocessing/image1.jpg
Downloaded images_multiprocessing/image2.jpg
Downloaded images_multiprocessing/image3.jpg
Downloaded images_multiprocessing/image4.jpg
Downloaded images_multiprocessing/image5.jpg

这个例子展示了如何使用多进程来并行下载图片。

总结

本文介绍了并发的基本概念，并详细探讨了Python中的并发机制，包括多线程和多进程。通过示例代码展示了如何使用concurrent.futures和multiprocessing模块来简化并发编程。最后，通过实战案例展示了如何使用多线程和多进程来并行下载图片。通过这些方法，开发者可以更好地利用现代多核处理器的优势，提升程序的执行效率。