终于把神经网络算法搞懂了！-51CTO.COM

大家好，我是小寒

今天给大家分享一个强大的算法模型，神经网络

神经网络（Neural Network）是一类旨在模仿人类大脑结构和功能的计算模型。

它由一系列相互连接的节点（称为“神经元”）组成，这些节点按照一定的层级结构组织，通常包括输入层、隐藏层和输出层。

图片

基本结构

输入层（Input Layer）
输入层接收来自外部的数据，每个节点对应一个输入特征。
隐藏层（Hidden Layers）
隐藏层位于输入层和输出层之间。神经网络的复杂性通常来源于隐藏层的数量和每一层中神经元的数量。
每个隐藏层中的节点通过加权连接接收来自上一层的输入信号，并通过激活函数进行非线性变换。
输出层（Output Layer）
输出层的节点输出最终的结果，这些结果可以是分类标签、回归值等。

图片

神经元的工作原理

为了更好地理解神经网络的工作原理，我们首先放大单个节点（神经元）。

每个神经元接收来自前一层的输入，执行以下步骤。

案例分享

下面是一个使用 numpy 来从头构建一个神经网络的示例代码。

import numpy as np
import matplotlib.pyplot as plt

# sigmoidfunction
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# define the derivative of the sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)


class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Weights and biases
        self.W1 = np.random.randn(self.input_size, self.hidden_size)  # Weights between input and hidden layer
        self.b1 = np.ones((1, self.hidden_size))  # Biases for the hidden layer
        self.W2 = np.random.randn(self.hidden_size, self.output_size)  # Weights between hidden and output layer
        self.b2 = np.ones((1, self.output_size))  # Biases for the output layer

    # Forward Pass
    def forward(self, X):
        self.z1 = np.dot(X, self.W1) + self.b1  # Linear combination for hidden layer
        self.a1 = sigmoid(self.z1)  # Apply activation function to hidden layer
        self.z2 = np.dot(self.a1, self.W2) + self.b2  # Linear combination for output layer
        self.a2 = self.z2  # Output layer (no activation for regression)
        return self.a2
      
    def backward(self, X, y, output, learning_rate):
        m = X.shape[0]  # Number of training examples

        # Error and delta calculations
        self.error = y - output  # Error at the output layer
        self.delta_output = self.error  # Delta for the output layer
        self.error_hidden = np.dot(self.delta_output, self.W2.T)  # Error at the hidden layer
        self.delta_hidden = self.error_hidden * sigmoid_derivative(self.a1)  # Delta for the hidden layer

        # Gradient calculations
        self.W2_grad = np.dot(self.a1.T, self.delta_output) / m 
        self.b2_grad = np.sum(self.delta_output, axis=0, keepdims=True) / m
        self.W1_grad = np.dot(X.T, self.delta_hidden) / m
        self.b1_grad = np.sum(self.delta_hidden, axis=0, keepdims=True) / m

        # Update weights and biases
        self.W2 += learning_rate * self.W2_grad
        self.b2 += learning_rate * self.b2_grad
        self.W1 += learning_rate * self.W1_grad
        self.b1 += learning_rate * self.b1_grad
 
# create a networkobjekt
nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)

# define data
# [size, age]
X = np.array([
    [100, 5], [120, 10], [80, 15], [150, 2], [90, 20],
    [110, 7], [95, 12], [130, 8], [140, 5], [75, 18],
    [85, 14], [125, 6], [100, 10], [135, 4], [105, 9],
    [115, 11], [140, 3], [80, 20], [90, 22], [120, 14]
])

# Price in thousand euros
y = np.array([
    [200], [220], [170], [280], [160],
    [210], [175], [225], [270], [155],
    [185], [230], [195], [265], [175],
    [215], [275], [165], [185], [225]
])

# normalize data
X_mean, X_std = X.mean(axis=0), X.std(axis=0)
y_mean, y_std = y.mean(), y.std()

X_normalized = (X - X_mean) / X_std
y_normalized = (y - y_mean) / y_std

# Training loop
epochs = 2000
learning_rate = 0.01
losses = []

for epoch in range(epochs):
    # Forward pass
    output = nn.forward(X_normalized)
    
    # Backward pass
    nn.backward(X_normalized, y_normalized, output, learning_rate)
    
    # Calculate and print loss
    mse = np.mean(np.square(y_normalized - output))
    losses.append(mse)
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {mse}")

# Prediction function
def predict(size, age):
    """
    Predicts the house price based on size and age.

    Args:
        size: Size of the house.
        age: Age of the house.

    Returns:
        The predicted price in thousand euros.
    """
    input_normalized = (np.array([[size, age]]) - X_mean) / X_std
    output_normalized = nn.forward(input_normalized)
    return output_normalized * y_std + y_mean

plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Loss while training")
plt.show()

# Test Prediction
print("\nPredictions:")
print(f"House with 110 m² and 7 years: {predict(110, 7)[0][0]:.2f} t€")
print(f"House with 85 m² and 12 years: {predict(85, 12)[0][0]:.2f} t€")

神经网络的类型

1.前馈神经网络（FNN）

前馈神经网络 (FNN) 是最简单的人工神经网络，其中信息只朝一个方向移动，即向前移动，从输入节点，经过隐藏节点（如果有），最后到达输出节点。

网络中没有循环或环路，因此是一种简单的架构。

图片

工作原理

输入层：
输入特征（例如，图像的像素值）被输入到网络中。
隐藏层
每个隐藏层由处理来自前一层的输入的神经元组成。
每个神经元计算其输入的加权和，添加偏差，并将结果传递给激活函数（例如 ReLU、sigmoid）。
输出层
最后一层提供网络的输出（例如，分类中的类概率或回归中的连续值）。

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Build a simple Feedforward Neural Network
model = Sequential([
    Dense(64, activatinotallow='relu', input_shape=(10,)),  # Hidden layer with 64 neurons
    Dense(64, activatinotallow='relu'),                     # Another hidden layer
    Dense(1, activatinotallow='sigmoid')                    # Output layer for binary classification
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Summary of the model
model.summary()

2.卷积神经网络（CNN）

卷积神经网络 (CNN) 是一类深度神经网络，专门用于处理结构化网格状数据（例如图像）。

它们使用卷积层自动且自适应地从输入数据中学习特征的空间层次结构。

卷积层
将一组过滤器（核）应用于输入，这些过滤器在输入数据上滑动以生成特征图。
池化层
降低特征图的维数，使得网络的计算效率更高，并且对输入中的小平移具有不变性。

工作原理

输入层
CNN 的输入通常是以像素值矩阵表示的图像。
对于彩色图像，这通常是 3D 矩阵（高度 × 宽度 × 通道）。
卷积层
CNN 的核心思想是卷积运算，其中一个称为过滤器或内核的小矩阵在输入图像上滑动，并计算过滤器与其覆盖的图像块之间的点积。
此操作生成特征图。
池化层
池化层减少了特征图的空间维度（高度和宽度），使计算更易于管理，并允许网络专注于最重要的特征。
最常见的类型是最大池化，它从特征图的每个块中获取最大值。
全连接层
经过几个卷积层和池化层之后，神经网络中的高级推理通过全连接层完成。
输出层
输出层使用特定的激活函数（对于分类任务，通常是 Softmax）来产生最终预测。输出是所有可能类别的概率分布。

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build the CNN model
model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activatinotallow='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, (3, 3), activatinotallow='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activatinotallow='relu'),
    Dense(10, activatinotallow='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)

# Evaluate the model
score = model.evaluate(X_test, y_test)
print(f'Test accuracy: {score[1]*100:.2f}%')

3.循环神经网络（RNN）

循环神经网络 (RNN) 是一类用于处理顺序数据的神经网络。与标准神经网络不同，RNN 具有循环，可让其保留先前输入的“记忆”，因此非常适合处理涉及序列的任务。

图片

LSTM（长短期记忆）
一种 RNN，可以通过维护随每次输入更新的记忆单元来学习长期依赖关系。LSTM 解决了标准 RNN 的梯度消失问题。
GRU（门控循环单元）
LSTM 的简化版本，将遗忘门和输入门组合成单个更新门。
GRU 计算效率高，性能通常与 LSTM 一样好。

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Build an LSTM for time series prediction
model = Sequential([
    LSTM(50, activatinotallow='relu', input_shape=(10, 1)),  # LSTM layer
    Dense(1)                                           # Output layer
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Summary of the model
model.summary()