利用Keras中的权重约束减少深度神经网络中的过拟合-keras实现深度神经网络

[[333587]]

权重约束提供了一种方法，用于减少深度学习神经网络模型对训练数据的过度拟合，并改善模型对新数据（例如测试集）的性能。有多种类型的权重约束，例如最大和单位向量规范，有些需要必须配置的超参数。

在本教程中，您将发现Keras API，用于向深度学习神经网络模型添加权重约束以减少过度拟合。

完成本教程后，您将了解：

如何使用Keras API创建向量范数约束。
如何使用Keras API为MLP，CNN和RNN层添加权重约束。
如何通过向现有模型添加权重约束来减少过度拟合。

教程概述

本教程分为三个部分，他们是：

Keras的重量约束
图层上的权重约束
体重约束案例研究

Keras的重量约束

Keras API支持权重限制。约束是按层指定的，但是在层中应用和强制执行每个节点。

使用约束通常涉及在图层上为输入权重设置kernel_constraint参数，并为偏差权重设置bias_constraint。

通常，权重约束不用于偏差权重。一组不同的向量规范可以用作约束，作为keras.constraints模块中的类提供。他们是：

最大范数（max_norm），用于强制权重等于或低于给定限制。
非负规范（non_neg），强制权重具有正数。
单位范数（unit_norm），强制权重为1.0。
Min-Max范数（min_max_norm），用于强制权重在一个范围之间。

例如，一个简单的约束可以这样被引入和实例化：

# import norm  
from keras.constraints import max_norm  
# instantiate norm  
norm = max_norm(3.0)  
# import norm  
from keras.constraints import max_norm  
# instantiate norm  
norm = max_norm(3.0) 
1.
2.
3.
4.
5.
6.
7.
8.

图层上的权重约束

权重规范可用于Keras的大多数层。在本节中，我们将看一些常见的例子。

MLP加权约束

以下示例在密集完全连接层上设置最大范数权重约束。

# example of max norm on a dense layer  
from keras.layers import Dense  
from keras.constraints import max_norm  
...  
model.add(Dense(32, kernel_constraint=max_norm(3), bias_constraint==max_norm(3)))  
...  
# example of max norm on a dense layer  
from keras.layers import Dense  
from keras.constraints import max_norm  
...  
model.add(Dense(32, kernel_constraint=max_norm(3), bias_constraint==max_norm(3)))  
... 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

CNN加权约束

下面的示例在卷积层上设置最大范数权重约束。

# example of max norm on a cnn layer  
from keras.layers import Conv2D  
from keras.constraints import max_norm 
...  
model.add(Conv2D(32, (3,3), kernel_constraint=max_norm(3), bias_constraint==max_norm(3)))  
... 
1.
2.
3.
4.
5.
6.

RNN权重约束

与其他图层类型不同，递归神经网络允许您对输入权重和偏差以及循环输入权重设置权重约束。通过图层的recurrent_constraint参数设置重复权重的约束。以下示例在LSTM图层上设置最大范数权重约束。

# example of max norm on an lstm layer  
from keras.layers import LSTM  
from keras.constraints import max_norm  
...  
model.add(LSTM(32, kernel_constraint=max_norm(3), recurrent_constraint=max_norm(3), bias_constraint==max_norm(3)))  
...  
# example of max norm on an lstm layer  
from keras.layers import LSTM  
from keras.constraints import max_norm  
...  
model.add(LSTM(32, kernel_constraint=max_norm(3), recurrent_constraint=max_norm(3), bias_constraint==max_norm(3)))  
... 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

现在我们知道如何使用权重约束API，让我们看一个有效的例子。

加权约束案例研究

在本节中，我们将演示如何使用权重约束来减少MLP对简单二元分类问题的过度拟合。此示例提供了一个模板，用于将权重约束应用于您自己的神经网络以进行分类和回归问题。

二元分类问题

我们将使用标准二进制分类问题来定义两个半圆数据集，每个类一个半圆。每个观测值都有两个输入变量，它们具有相同的比例，类输出值为0或1.该数据集称为“月球”数据集，因为绘制时每个类中的观测值的形状。我们可以使用make_moons（）函数从这个问题中生成观察结果。我们将为数据添加噪声并为随机数生成器播种，以便每次运行代码时生成相同的样本。

# generate 2d classification dataset  
X, y = make_moons(n_samples=100, noise=0.2, random_state=1) 
1.
2.

我们可以绘制两个变量在图表上作为x和y坐标的数据集，并将类值作为观察的颜色。下面列出了生成数据集并绘制数据集的完整示例。

# generate two moons dataset  
from sklearn.datasets import make_moons  
from matplotlib import pyplot  
from pandas import DataFrame  
# generate 2d classification dataset  
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)  
# scatter plot, dots colored by class value  
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))  
colors = {0:'red', 1:'blue'}  
fig, ax = pyplot.subplots() 
grouped = df.groupby('label')  
for key, group in grouped:  
    group.plot(axax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])  
pyplot.show()  
# generate two moons dataset  
from sklearn.datasets import make_moons  
from matplotlib import pyplot  
from pandas import DataFrame  
# generate 2d classification dataset  
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)  
# scatter plot, dots colored by class value  
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))  
colors = {0:'red', 1:'blue'}  
fig, ax = pyplot.subplots()  
grouped = df.groupby('label')  
for key, group in grouped:  
    group.plot(axax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])  
pyplot.show() 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.

运行该示例会创建一个散点图，显示每个类中观察的半圆形或月亮形状。我们可以看到点的分散中的噪音使得卫星不太明显。

这是一个很好的测试问题，因为类不能用一行来分隔，例如不是线性可分的，需要非线性方法，如神经网络来解决。我们只生成了100个样本，这对于神经网络而言很小，提供了过度拟合训练数据集的机会，并且在测试数据集上具有更高的误差：使用正则化的一个好例子。此外，样本具有噪声，使模型有机会学习不一致的样本的各个方面。

过度多层感知器

我们可以开发一个MLP模型来解决这个二进制分类问题。该模型将具有一个隐藏层，其具有比解决该问题所需的节点更多的节点，从而提供过度拟合的机会。我们还将训练模型的时间超过确保模型过度所需的时间。在我们定义模型之前，我们将数据集拆分为训练集和测试集，使用30个示例来训练模型，使用70个示例来评估拟合模型的性能。

X, y = make_moons(n_samples=100, noise=0.2, random_state=1)  
# split into train and test  
n_train = 30  
trainX, testX = X[:n_train, :], X[n_train:, :]  
trainy, testy = y[:n_train], y[n_train:] 
1.
2.
3.
4.
5.

接下来，我们可以定义模型。隐藏层使用隐藏层中的500个节点和整流的线性激活函数。在输出层中使用S形激活函数以预测0或1的类值。该模型使用二元交叉熵损失函数进行优化，适用于二元分类问题和梯度下降的有效Adam版本。

# define model  
model = Sequential()  
model.add(Dense(500, input_dim=2, activation='relu'))  
model.add(Dense(1, activation='sigmoid'))  
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 
1.
2.
3.
4.
5.

然后，定义的模型拟合4,000个训练数据，默认批量大小为32。我们还将使用测试数据集作为验证数据集。

# fit model  
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0) 
1.
2.

我们可以在测试数据集上评估模型的性能并报告结果。

# evaluate the model  
_, train_acc = model.evaluate(trainX, trainy, verbose=0)  
_, test_acc = model.evaluate(testX, testy, verbose=0)  
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc)) 
1.
2.
3.
4.

最后，我们将在每个时期的训练集和测试集上绘制模型的性能。如果模型确实过度拟合训练数据集，我们将期望训练集上的准确度线图继续增加并且测试设置上升然后随着模型在训练数据集中学习统计噪声而再次下降。

# plot history  
pyplot.plot(history.history['acc'], label='train')  
pyplot.plot(history.history['val_acc'], label='test')  
pyplot.legend()  
pyplot.show() 
1.
2.
3.
4.
5.

我们可以将所有这些部分组合在一起; 下面列出了完整的示例。

# mlp overfit on the moons dataset  
from sklearn.datasets import make_moons  
from keras.layers import Dense  
from keras.models import Sequential  
from matplotlib import pyplot  
# generate 2d classification dataset  
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)  
# split into train and test  
n_train = 30  
trainX, testX = X[:n_train, :], X[n_train:, :] 
trainy, testy = y[:n_train], y[n_train:]  
# define model  
model = Sequential()  
model.add(Dense(500, input_dim=2, activation='relu'))  
model.add(Dense(1, activation='sigmoid'))  
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])  
# fit model  
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)  
# evaluate the model  
_, train_acc = model.evaluate(trainX, trainy, verbose=0)  
_, test_acc = model.evaluate(testX, testy, verbose=0)  
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))  
# plot history  
pyplot.plot(history.history['acc'], label='train')  
pyplot.plot(history.history['val_acc'], label='test')  
pyplot.legend()  
pyplot.show() 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.

运行该示例报告列车和测试数据集上的模型性能。我们可以看到模型在训练数据集上的性能优于测试数据集，这是过度拟合的一个可能标志。鉴于神经网络和训练算法的随机性，您的具体结果可能会有所不同。因为模型是过度拟合的，所以我们通常不会期望在相同数据集上重复运行模型的精度差异（如果有的话）。

Train: 1.000, Test: 0.914 
1.

创建一个图，显示训练集和测试集上模型精度的曲线图。我们可以看到过度拟合模型的预期形状，其中测试精度增加到一个点然后再次开始减小。

具有加权约束的Overfit MLP

我们可以更新示例以使用权重约束。有一些不同的加权限制可供选择。这个模型的一个很好的简单约束是简单地标准化加权，使得范数等于1.0。此约束具有强制所有传入权重较小的效果。我们可以通过在Keras中使用unit_norm来实现。可以将此约束添加到第一个隐藏层，如下所示：

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm())) 
1.

我们也可以通过使用min_max_norm并将min和maximum设置为1.0来实现相同的结果，例如：

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=min_max_norm(min_value=1.0, max_value=1.0))) 
1.

我们无法通过最大范数约束获得相同的结果，因为它允许规范等于或低于指定的限制; 例如：

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=max_norm(1.0))) 
1.

下面列出了具有单位规范约束的完整更新示例：

# mlp overfit on the moons dataset with a unit norm constraint  
from sklearn.datasets import make_moons  
from keras.layers import Dense  
from keras.models import Sequential  
from keras.constraints import unit_norm  
from matplotlib import pyplot  
# generate 2d classification dataset  
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)  
# split into train and test  
n_train = 30  
trainX, testX = X[:n_train, :], X[n_train:, :]  
trainy, testy = y[:n_train], y[n_train:]  
# define model  
model = Sequential()  
model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm()))  
model.add(Dense(1, activation='sigmoid'))  
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])  
# fit model  
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)  
# evaluate the model 
_, train_acc = model.evaluate(trainX, trainy, verbose=0)  
_, test_acc = model.evaluate(testX, testy, verbose=0)  
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))  
# plot history  
pyplot.plot(history.history['acc'], label='train')  
pyplot.plot(history.history['val_acc'], label='test')  
pyplot.legend()  
pyplot.show() 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.

运行该示例报告训练集和测试数据集上的模型性能。我们可以看到，对权重大小的严格限制确实提高了模型在测试集上的性能，而不会影响训练集的性能。

Train: 1.000, Test: 0.943 
1.

回顾训练集的曲线和测试精度，我们可以看到模型已经过度拟合训练数据集了。训练集和测试集的模型精度继续提高到稳定水平。

扩展

本节列出了一些扩展您可能希望探索的教程的想法。

报告加权标准。更新示例以计算网络权重的大小，并证明约束确实使得幅度更小。
约束输出层。更新示例以将约束添加到模型的输出层并比较结果。
约束偏差。更新示例以向偏差权重添加约束并比较结果。
反复评估。更新示例以多次拟合和评估模型，并报告模型性能的均值和标准差。