在神经网络中实现反向传播

发布于 2024-3-27 15:40

浏览

0收藏

建立神经网络时，需要采取几个步骤。其中两个最重要的步骤是实现正向和反向传播。这两个词听起来真的很沉重，并且总是让初学者感到恐惧。但实际上，如果将这些技术分解为各自的步骤，则可以正确理解它们。在本文中，我们将专注于反向传播及其每个步骤的直观知识。

什么是反向传播?

这只是实现神经网络的一项简单技术，允许我们计算参数的梯度，以执行梯度下降并使成本函数最小化。许多学者将反向传播描述为神经网络中数学上最密集的部分。不过请放轻松，因为在本文中我们将完全解密反向传播的每个部分。

在神经网络中实现反向传播-AI.x社区

实施反向传播

假设一个简单的两层神经网络-一个隐藏层和一个输出层。我们可以如下执行反向传播初始化要用于神经网络的权重和偏差：这涉及随机初始化神经网络的权重和偏差。这些参数的梯度将从反向传播中获得，并用于更新梯度下降。


#Import Numpy library
import numpy as np

#set seed for reproducability 
np.random.seed(100)
#We will first initialize the weights and bias needed and store them in a dictionary called W_B
def initialize(num_f, num_h, num_out):
    
    '''
    Description: This function randomly initializes the weights and biases of each layer of the neural network
    
    Input Arguments:
    num_f - number of training features
    num_h -the number of nodes in the hidden layers
    num_out - the number of nodes in the output 
    
    Output: 
    
    W_B - A dictionary of the initialized parameters.
    
    '''
    
    #randomly initialize weights and biases, and proceed to store in a dictionary
    W_B = {
        'W1': np.random.randn(num_h, num_f),
        'b1': np.zeros((num_h, 1)),
        'W2': np.random.randn(num_out, num_h),
        'b2': np.zeros((num_out, 1))
    }
    return W_B

执行前向传播：这涉及到计算隐藏层和输出层的线性和激活输出。

对于隐藏层：我们将使用如下所示的relu激活功能：


#We will now proceed to create functions for each of our activation functions

def relu (Z):
    
    '''
    Description: This function performs the relu activation function on a given number or matrix. 
    
    Input Arguments:
    Z - matrix or integer
    
    Output: 
    
   relu_Z -  matrix or integer with relu performed on it
    
    '''
    relu_Z = np.maximum(Z,0)
    
    return relu_Z

对于输出层：

我们将使用S型激活函数，如下所示：


def sigmoid (Z):
    
    '''
    Description: This function performs the sigmoid activation function on a given number or matrix. 
    
    Input Arguments:
    Z - matrix or integer
    
    Output: 
    
   sigmoid_Z -  matrix or integer with sigmoid performed on it
    
    '''
    sigmoid_Z = 1 / (1 + (np.exp(-Z)))
    
    return sigmoid_Z

执行前向传播：


#We will now proceed to perform forward propagation

def forward_propagation(X, W_B):    
    '''
    Description: This function performs the forward propagation in a vectorized form 
    
    Input Arguments:
    X - input training examples
    W_B - initialized weights and biases
    
    Output: 
    
   forward_results - A dictionary containing the linear and activation outputs
    
    '''
    
    #Calculate the linear Z for the hidden layer
    Z1 = np.dot(X, W_B['W1'].T)  + W_B['b1']
    
    #Calculate the activation ouput for the hidden layer
    A = relu(Z1)
    
    #Calculate the linear Z for the output layer
    Z2 = np.dot(A, W_B['W2'].T) + W_B['b2']
    
    #Calculate the activation ouput for the ouptu layer
    Y_pred = sigmoid(Z2) 
    
    #Save all ina dictionary 
    forward_results = {"Z1": Z1,
                      "A": A,
                      "Z2": Z2,
                      "Y_pred": Y_pred}
    
    return forward_results

执行向后传播：相对于与梯度下降相关的参数，计算成本的梯度。在这种情况下，为dLdZ2，dLdW2，dLdb2，dLdZ1，dLdW1和dLdb1。这些参数将与学习率结合起来执行梯度下降。我们将为许多训练样本(no_examples)实现反向传播的矢量化版本。

分步指南如下：

从传递中获取结果，如下所示：

forward_results = forward_propagation(X, W_B)
Z1 = forward_results['Z1']
A = forward_results['A']
Z2 = forward_results['Z2']
Y_pred = forward_results['Y_pred']

获得训练样本的数量，如下所示：

no_examples = X.shape[1]

计算函数的损失：

L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred))

计算每个参数的梯度，如下所示：

dLdZ2= Y_pred - Y_true
dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T)
dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True)
dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2)))
dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T)
dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True)

将梯度下降所需的计算梯度存储在字典中：

gradients = {"dLdW1": dLdW1,
             "dLdb1": dLdb1,
             "dLdW2": dLdW2,
             "dLdb2": dLdb2}

返回损耗和存储的梯度：

return gradients, L

这是完整的向后传播功能：


def backward_propagation(X, W_B, Y_true):
    '''Description: This function performs the backward propagation in a vectorized form 
    
    Input Arguments:
    X - input training examples
    W_B - initialized weights and biases
    Y_True - the true target values of the training examples
    
    Output: 
    
    gradients - the calculated gradients of each parameter
    L - the loss function
    
    '''
    
    # Obtain the forward results from the forward propagation 
    
    forward_results = forward_propagation(X, W_B)
    Z1 = forward_results['Z1']
    A = forward_results['A']
    Z2 = forward_results['Z2']
    Y_pred = forward_results['Y_pred']
    
    #Obtain the number of training samples    
    no_examples = X.shape[1]
    
    # Calculate loss 
    L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred))
    
    #Calculate the gradients of each parameter needed for gradient descent 
    dLdZ2= Y_pred - Y_true
    dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T)
    dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True)
    dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2)))
    dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T)
    dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True)
    
    #Store gradients for gradient descent in a dictionary 
    gradients = {"dLdW1": dLdW1,
             "dLdb1": dLdb1,
             "dLdW2": dLdW2,
             "dLdb2": dLdb2}
    
    return gradients, L

许多人总是认为反向传播很困难，但是正如本文中介绍的情形，事实并非如此。必须掌握每个步骤，才能掌握整个反向传播技术。另外，有必要掌握线性代数和微积分等数学知识，以了解如何计算每个函数的各个梯度。使用这些工具，反向传播应该是小菜一碟!实际上，反向传播通常由使用的深度学习框架来处理。但是，了解这种技术的内在作用是值得的，因为它有时可以帮助我们理解神经网络为何训练得不好。

本文转载小白遇见AI ，作者：小烦

原文链接：https://mp.weixin.qq.com/s/vx2lqz5o8JchPr226lC9cA

标签

神经网络

反向传播

已于2024-3-27 16:09:56修改