一、前言 #
神经网络、深度学习理论一片,基本都可以说道一下,但是真的上手搭建一个神经网络,并将数据处理并进行实践就难到我了。这里记录一下数据处理和神经网络实践。算作学习记录,或者说模板,后续搭建神经网络就参考这个博客,提供学习模板
这里做的是minst手写数字识别的数据集,数据和代码来源主要来自kaggle
二、正文 #
1. 数据处理部分 #
先看目录结构有啥
1import numpy as np # linear algebra
2import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
3
4import os
5for dirname, _, filenames in os.walk('/kaggle/input'):
6 for filename in filenames:
7 print(os.path.join(dirname, filename))
1/kaggle/input/digit-recognizer/train.csv
2/kaggle/input/digit-recognizer/test.csv
3/kaggle/input/digit-recognizer/sample_submission.csv
读取数据自然是用pandas的read_csv函数
1train_data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
2print(train_data)
1 label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 \
20 1 0 0 0 0 0 0 0 0
31 0 0 0 0 0 0 0 0 0
42 1 0 0 0 0 0 0 0 0
53 4 0 0 0 0 0 0 0 0
64 0 0 0 0 0 0 0 0 0
7... ... ... ... ... ... ... ... ... ...
841995 0 0 0 0 0 0 0 0 0
941996 1 0 0 0 0 0 0 0 0
1041997 7 0 0 0 0 0 0 0 0
1141998 6 0 0 0 0 0 0 0 0
1241999 9 0 0 0 0 0 0 0 0
13
14 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 \
150 0 ... 0 0 0 0 0
161 0 ... 0 0 0 0 0
172 0 ... 0 0 0 0 0
183 0 ... 0 0 0 0 0
194 0 ... 0 0 0 0 0
20... ... ... ... ... ... ... ...
2141995 0 ... 0 0 0 0 0
2241996 0 ... 0 0 0 0 0
2341997 0 ... 0 0 0 0 0
2441998 0 ... 0 0 0 0 0
2541999 0 ... 0 0 0 0 0
26
27 pixel779 pixel780 pixel781 pixel782 pixel783
280 0 0 0 0 0
291 0 0 0 0 0
302 0 0 0 0 0
313 0 0 0 0 0
324 0 0 0 0 0
33... ... ... ... ... ...
3441995 0 0 0 0 0
3541996 0 0 0 0 0
3641997 0 0 0 0 0
3741998 0 0 0 0 0
3841999 0 0 0 0 0
39
40[42000 rows x 785 columns]
取出label,也就是标记,并做一下统计,用sns的countplot画一下统计图
1import seaborn as sns
11 4684
27 4401
33 4351
49 4188
52 4177
66 4137
70 4132
84 4072
98 4063
105 3795
11Name: label, dtype: int64
取数据部分,就是把数据裁掉label列
1X_train = train_data.drop(labels=['label'], axis=1)
2print(X_train)
1 pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 \
20 0 0 0 0 0 0 0 0 0
31 0 0 0 0 0 0 0 0 0
42 0 0 0 0 0 0 0 0 0
53 0 0 0 0 0 0 0 0 0
64 0 0 0 0 0 0 0 0 0
7... ... ... ... ... ... ... ... ... ...
841995 0 0 0 0 0 0 0 0 0
941996 0 0 0 0 0 0 0 0 0
1041997 0 0 0 0 0 0 0 0 0
1141998 0 0 0 0 0 0 0 0 0
1241999 0 0 0 0 0 0 0 0 0
13
14 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 \
150 0 ... 0 0 0 0 0
161 0 ... 0 0 0 0 0
172 0 ... 0 0 0 0 0
183 0 ... 0 0 0 0 0
194 0 ... 0 0 0 0 0
20... ... ... ... ... ... ... ...
2141995 0 ... 0 0 0 0 0
2241996 0 ... 0 0 0 0 0
2341997 0 ... 0 0 0 0 0
2441998 0 ... 0 0 0 0 0
2541999 0 ... 0 0 0 0 0
26
27 pixel779 pixel780 pixel781 pixel782 pixel783
280 0 0 0 0 0
291 0 0 0 0 0
302 0 0 0 0 0
313 0 0 0 0 0
324 0 0 0 0 0
33... ... ... ... ... ...
3441995 0 0 0 0 0
3541996 0 0 0 0 0
3641997 0 0 0 0 0
3741998 0 0 0 0 0
3841999 0 0 0 0 0
39
40[42000 rows x 784 columns]
数据取出来了,删除原变量清理空间
1del train_data
处理label为训练需要的,由于输出为10个数字,所以转成[0, 1, …, 0]的形式,比较好输出 使用tensorflow的keras中的to_categorical函数进行转化
1import tensorflow as tf
2print(Y_train)
3Y_train = tf.keras.utils.to_categorical(Y_train, num_classes = 10)
4print(Y_train)
10 1
21 0
32 1
43 4
54 0
6 ..
741995 0
841996 1
941997 7
1041998 6
1141999 9
12Name: label, Length: 42000, dtype: int64
13[[0. 1. 0. ... 0. 0. 0.]
14 [1. 0. 0. ... 0. 0. 0.]
15 [0. 1. 0. ... 0. 0. 0.]
16 ...
17 [0. 0. 0. ... 1. 0. 0.]
18 [0. 0. 0. ... 0. 0. 0.]
19 [0. 0. 0. ... 0. 0. 1.]]
2. 搭建神经网络 #
1.1. 全连接神经网络 #
第一步先用最简单的刚学会的全连接神经网络进行搭建 先初始化tpu
1# detect and init the TPU
2tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
3tf.config.experimental_connect_to_cluster(tpu)
4tf.tpu.experimental.initialize_tpu_system(tpu)
5
6# instantiate a distribution strategy
7tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
全连接神经网络使用tensorflow上层封装的keras很方便的搭建
- 由于数据为784像素,搭建一个$784 \times 300 \times 10$的三层全连接神经网络
- 激活函数用sigmoid函数,输出层不使用激活函数
- 训练算法为随机梯度下降算法SDG
- 损失函数使用最常见的方差,也就是MSE函数
- 评估函数,就是每次训练给自己看的正确率,使用accuracy
- 没有让定学习率,是keras自定义了一个学习率,想改可以更改,仿照下面示例
1lr = tf.keras.get_value(model.optimizer.lr)
2tf.keras.set_value(model.optimizer.lr, lr * 0.1)
搭建神经网络
1# 用tpu进行编译
2with tpu_strategy.scope():
3 model = tf.keras.Sequential([
4 # Adds a densely-connected layer with 784 units to the model:
5 tf.keras.layers.Dense(784, activation='sigmoid', input_shape=(784,)),
6 # Add another:
7 tf.keras.layers.Dense(300, activation='sigmoid'),
8 # Add an output layer with 10 output units:
9 tf.keras.layers.Dense(10)])
10
11 model.compile(optimizer='sgd',
12 loss='mse',
13 metrics=['accuracy'])
开始训练,训练10轮,不使用batch_size,也就是一个数据训练一次,使用就是多少个数据一起计算损失进行训练 上面的评价函数应该用accuracy,不过我写成了mse,和loss一样了,不过可以看出损失在下降
1model.fit(X_train, Y_train, epochs=10)
1Train on 42000 samples
2Epoch 1/10
342000/42000 [==============================] - 19s 445us/sample - loss: 0.0662 - mse: 0.0662
4Epoch 2/10
542000/42000 [==============================] - 15s 358us/sample - loss: 0.0450 - mse: 0.0450
6Epoch 3/10
742000/42000 [==============================] - 15s 363us/sample - loss: 0.0389 - mse: 0.0389
8Epoch 4/10
942000/42000 [==============================] - 15s 361us/sample - loss: 0.0352 - mse: 0.0352
10Epoch 5/10
1142000/42000 [==============================] - 15s 368us/sample - loss: 0.0327 - mse: 0.0327
12Epoch 6/10
1342000/42000 [==============================] - 15s 365us/sample - loss: 0.0307 - mse: 0.0307
14Epoch 7/10
1542000/42000 [==============================] - 15s 360us/sample - loss: 0.0290 - mse: 0.0290
16Epoch 8/10
1742000/42000 [==============================] - 15s 361us/sample - loss: 0.0277 - mse: 0.0277
18Epoch 9/10
1942000/42000 [==============================] - 16s 382us/sample - loss: 0.0265 - mse: 0.0265
20Epoch 10/10
2142000/42000 [==============================] - 15s 360us/sample - loss: 0.0255 - mse: 0.0255
预测正确率为91.085%,很开心,初步使用神经网络完成
几个参数修改对比
- 输出层加上sigmoid激活函数,正确率降低到81.285%,猜测限制了发挥
- 输出层使用relu函数,正确率比sigmoid函数高一点,到达85.014%,应该同样限制了发挥吧
- 所有层使用relu函数,预测结果直接有问题,relu函数导致结果全部为0,所以无法正常训练
- 根据网上查到的信息,主要原因是输入没有做归一化,权值初始化有问题,训练过程出现权值过大或者过小,通过relu函数变成0,训练过程权值无法调整到合适的值导致无法正常训练
- 隐藏层改为100个神经元,预测结果和300差别不大,都是91.000%
- 神经网络改为$784 \times 150 \times 150 \times 10$,正确率达到86.257%,暂时未测试是否是训练轮数不够导致影响
- sigmoid函数作为激活函数,在反向传播算法中,传播越远,梯度下降越难。由于传播使用的前一层的导数乘积,sigmoid函数导数最大为$\frac{1}{4}$,所以会下降更难,一般全连接神经网络只有三层
1.2. 卷积神经网络 #
1# 用tpu进行编译
2with tpu_strategy.scope():
3 # Set the CNN model
4 # my CNN architechture is In -> [[Conv2D->relu]*2 -> MaxPool2D -> Dropout]*2 -> Flatten -> Dense -> Dropout -> Out
5 model = tf.keras.Sequential()
6 model.add(tf.keras.layers.Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same',
7 activation ='relu', input_shape = (28,28,1)))
8 model.add(tf.keras.layers.Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same',
9 activation ='relu'))
10 model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
11 model.add(tf.keras.layers.Dropout(0.25))
12
13
14 model.add(tf.keras.layers.Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same',
15 activation ='relu'))
16 model.add(tf.keras.layers.Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same',
17 activation ='relu'))
18 model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2)))
19 model.add(tf.keras.layers.Dropout(0.25))
20
21
22 model.add(tf.keras.layers.Flatten())
23 model.add(tf.keras.layers.Dense(256, activation = "relu"))
24 model.add(tf.keras.layers.Dropout(0.5))
25 model.add(tf.keras.layers.Dense(10))
26
27 # Define the optimizer
28 optimizer = tf.keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
29 # Compile the model
30 model.compile(optimizer = optimizer , loss = "mse", metrics=["accuracy"])
31
32 model.fit(X_train, Y_train, epochs=9, batch_size=42)
1Train on 42000 samples
2Epoch 1/9
342000/42000 [==============================] - 9s 225us/sample - loss: 0.6169 - accuracy: 0.8257
4Epoch 2/9
542000/42000 [==============================] - 5s 126us/sample - loss: 0.0145 - accuracy: 0.9686
6Epoch 3/9
742000/42000 [==============================] - 5s 131us/sample - loss: 0.0123 - accuracy: 0.9736
8Epoch 4/9
942000/42000 [==============================] - 5s 126us/sample - loss: 0.0113 - accuracy: 0.9763
10Epoch 5/9
1142000/42000 [==============================] - 5s 125us/sample - loss: 0.0110 - accuracy: 0.9772
12Epoch 6/9
1342000/42000 [==============================] - 6s 132us/sample - loss: 0.0107 - accuracy: 0.9774
14Epoch 7/9
1542000/42000 [==============================] - 6s 135us/sample - loss: 0.0105 - accuracy: 0.9775
16Epoch 8/9
1742000/42000 [==============================] - 6s 132us/sample - loss: 0.0102 - accuracy: 0.9787
18Epoch 9/9
1942000/42000 [==============================] - 5s 125us/sample - loss: 0.0102 - accuracy: 0.9774
3. 结果预测 #
- 对测试数据进行读取和预测
1X_test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
2X_test = X_test.values.reshape(-1,28,28,1)
3result = model.predict(X_test)
4print(result)
[[-6.0319379e-03 2.2311732e-03 9.7684860e-01 ... -5.4722652e-03
5.9211254e-04 4.1266829e-03]
[ 1.0062367e+00 -2.2030249e-03 -5.0238818e-03 ... -3.2700002e-03
-9.3276799e-04 -3.5367012e-03]
[ 2.6999190e-03 6.4259917e-03 1.0489762e-02 ... -1.1652485e-03
6.1268814e-02 9.1225845e-01]
...
[-5.9012100e-03 -9.0321898e-04 1.8455610e-03 ... 2.4292246e-03
-3.2179952e-03 -1.5886426e-03]
[-7.6884702e-03 -5.6109652e-03 -2.5104508e-03 ... -7.8360438e-03
-1.0629505e-02 1.0693249e+00]
[-2.6000291e-03 5.1201209e-03 9.4756949e-01 ... -3.5209060e-03
5.4134727e-03 6.2924922e-03]]
- 这数据肯定不能用,用argmax转成我们需要的结果
1tmp = np.argmax(result, axis=1)
2print(tmp)
[2 0 9 ... 3 9 2]
- 存到csv,提交
1result_data = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')
2result_data['Label'] = tmp
3result_data.to_csv('/kaggle/working/sample_submission.csv', index = 0)