Python 科学计算的五大库-51CTO.COM

Python 是一门强大的编程语言，在科学计算领域有着广泛的应用。今天我们就来聊聊 Python 科学计算中常用的五大库：NumPy、Pandas、Matplotlib、SciPy 和 Scikit-learn。

1. NumPy

NumPy 是 Python 中用于处理数值数据的基础库。它提供了高效的数组对象和各种数学函数，使得数值计算变得非常方便。

基本使用：

import numpy as np

# 创建一个一维数组
arr = np.array([1, 2, 3, 4, 5])
print(arr)  # 输出: [1 2 3 4 5]

# 创建一个多维数组
multi_dim_arr = np.array([[1, 2, 3], [4, 5, 6]])
print(multi_dim_arr)
# 输出:
# [[1 2 3]
#  [4 5 6]]

# 数组的基本操作
print(arr + 1)  # 输出: [2 3 4 5 6]
print(arr * 2)  # 输出: [2 4 6 8 10]

高级用法：

# 生成随机数组
random_arr = np.random.rand(3, 3)
print(random_arr)

# 数组切片
sliced_arr = arr[1:4]
print(sliced_arr)  # 输出: [2 3 4]

# 广播机制
arr2 = np.array([1, 2, 3])
result = arr + arr2
print(result)  # 输出: [2 4 6 6 7]

2. Pandas

Pandas 是一个强大的数据处理和分析库，特别适合处理表格数据。

基本使用：

import pandas as pd

# 创建一个 DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print(df)
# 输出:
#       Name  Age
# 0    Alice   25
# 1      Bob   30
# 2  Charlie   35

# 选择列
ages = df['Age']
print(ages)
# 输出:
# 0    25
# 1    30
# 2    35
# Name: Age, dtype: int64

高级用法：

# 读取 CSV 文件
df = pd.read_csv('data.csv')
print(df.head())  # 显示前 5 行

# 数据筛选
filtered_df = df[df['Age'] > 30]
print(filtered_df)

# 数据聚合
grouped_df = df.groupby('Name').mean()
print(grouped_df)

3. Matplotlib

Matplotlib 是一个用于绘制图表的库，可以生成各种静态、动态和交互式图表。

基本使用：

import matplotlib.pyplot as plt

# 绘制简单的折线图
x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Plot')
plt.show()

高级用法：

# 绘制多个子图
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

ax1.plot(x, y, 'r')  # 红色折线
ax1.set_title('Subplot 1')

ax2.scatter(x, y, color='b')  # 蓝色散点图
ax2.set_title('Subplot 2')

plt.show()

4. SciPy

SciPy 是一个用于科学和工程计算的库，提供了许多高级数学函数和算法。

基本使用：

from scipy import stats

# 计算均值和标准差
data = [1, 2, 3, 4, 5]
mean = np.mean(data)
std_dev = np.std(data)
print(f'Mean: {mean}, Standard Deviation: {std_dev}')
# 输出: Mean: 3.0, Standard Deviation: 1.4142135623730951

# 概率分布
dist = stats.norm(loc=0, scale=1)
print(dist.pdf(0))  # 输出: 0.3989422804014327

高级用法：

# 最小二乘拟合
x = np.linspace(0, 10, 100)
y = 3 * x + 5 + np.random.normal(0, 1, 100)

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print(f'Slope: {slope}, Intercept: {intercept}')
# 输出: Slope: 2.995805608425055, Intercept: 5.046887465309874

5. Scikit-learn

Scikit-learn 是一个用于机器学习的库，提供了大量的算法和工具。

基本使用：

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# 加载 Iris 数据集
iris = load_iris()
X = iris.data
y = iris.target

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 训练模型
model = LogisticRegression()
model.fit(X_train, y_train)

# 预测
predictions = model.predict(X_test)
print(predictions)

高级用法：

# 交叉验证
from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X, y, cv=5)
print(f'Cross-validation scores: {scores}')
print(f'Mean score: {np.mean(scores)}')

实战案例：股票价格预测

假设我们要预测某只股票的未来价格。我们可以使用 Pandas 处理数据，NumPy 进行数值计算，Scikit-learn 构建预测模型。

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# 读取股票数据
df = pd.read_csv('stock_prices.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# 选择特征和目标变量
X = df[['Open', 'High', 'Low', 'Volume']].values
y = df['Close'].values

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 训练模型
model = LinearRegression()
model.fit(X_train, y_train)

# 预测
predictions = model.predict(X_test)

# 可视化结果
plt.figure(figsize=(10, 5))
plt.plot(y_test, label='Actual Prices')
plt.plot(predictions, label='Predicted Prices')
plt.xlabel('Time')
plt.ylabel('Price')
plt.title('Stock Price Prediction')
plt.legend()
plt.show()

总结

本文介绍了 Python 科学计算中常用的五大库：NumPy、Pandas、Matplotlib、SciPy 和 Scikit-learn。我们从基本使用到高级用法，逐步展示了每个库的核心功能和应用场景。通过实战案例，我们进一步巩固了这些库的综合应用。