minst手写数字识别——神经网络实战笔记

一、前言 #

神经网络、深度学习理论一片,基本都可以说道一下,但是真的上手搭建一个神经网络,并将数据处理并进行实践就难到我了。这里记录一下数据处理和神经网络实践。算作学习记录,或者说模板,后续搭建神经网络就参考这个博客,提供学习模板

这里做的是minst手写数字识别的数据集,数据和代码来源主要来自kaggle

二、正文 #

1. 数据处理部分 #

先看目录结构有啥

1import numpy as np # linear algebra
2import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
3
4import os
5for dirname, _, filenames in os.walk('/kaggle/input'):
6    for filename in filenames:
7        print(os.path.join(dirname, filename))
1/kaggle/input/digit-recognizer/train.csv
2/kaggle/input/digit-recognizer/test.csv
3/kaggle/input/digit-recognizer/sample_submission.csv

读取数据自然是用pandas的read_csv函数

1train_data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
2print(train_data)
 1       label  pixel0  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  \
 20          1       0       0       0       0       0       0       0       0
 31          0       0       0       0       0       0       0       0       0
 42          1       0       0       0       0       0       0       0       0
 53          4       0       0       0       0       0       0       0       0
 64          0       0       0       0       0       0       0       0       0
 7...      ...     ...     ...     ...     ...     ...     ...     ...     ...
 841995      0       0       0       0       0       0       0       0       0
 941996      1       0       0       0       0       0       0       0       0
1041997      7       0       0       0       0       0       0       0       0
1141998      6       0       0       0       0       0       0       0       0
1241999      9       0       0       0       0       0       0       0       0
13
14       pixel8  ...  pixel774  pixel775  pixel776  pixel777  pixel778  \
150           0  ...         0         0         0         0         0
161           0  ...         0         0         0         0         0
172           0  ...         0         0         0         0         0
183           0  ...         0         0         0         0         0
194           0  ...         0         0         0         0         0
20...       ...  ...       ...       ...       ...       ...       ...
2141995       0  ...         0         0         0         0         0
2241996       0  ...         0         0         0         0         0
2341997       0  ...         0         0         0         0         0
2441998       0  ...         0         0         0         0         0
2541999       0  ...         0         0         0         0         0
26
27       pixel779  pixel780  pixel781  pixel782  pixel783
280             0         0         0         0         0
291             0         0         0         0         0
302             0         0         0         0         0
313             0         0         0         0         0
324             0         0         0         0         0
33...         ...       ...       ...       ...       ...
3441995         0         0         0         0         0
3541996         0         0         0         0         0
3641997         0         0         0         0         0
3741998         0         0         0         0         0
3841999         0         0         0         0         0
39
40[42000 rows x 785 columns]

取出label,也就是标记,并做一下统计,用sns的countplot画一下统计图

1import seaborn as sns 
 11    4684
 27    4401
 33    4351
 49    4188
 52    4177
 66    4137
 70    4132
 84    4072
 98    4063
105    3795
11Name: label, dtype: int64

取数据部分,就是把数据裁掉label列

1X_train = train_data.drop(labels=['label'], axis=1)
2print(X_train)
 1       pixel0  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  pixel8  \
 20           0       0       0       0       0       0       0       0       0
 31           0       0       0       0       0       0       0       0       0
 42           0       0       0       0       0       0       0       0       0
 53           0       0       0       0       0       0       0       0       0
 64           0       0       0       0       0       0       0       0       0
 7...       ...     ...     ...     ...     ...     ...     ...     ...     ...
 841995       0       0       0       0       0       0       0       0       0
 941996       0       0       0       0       0       0       0       0       0
1041997       0       0       0       0       0       0       0       0       0
1141998       0       0       0       0       0       0       0       0       0
1241999       0       0       0       0       0       0       0       0       0
13
14       pixel9  ...  pixel774  pixel775  pixel776  pixel777  pixel778  \
150           0  ...         0         0         0         0         0
161           0  ...         0         0         0         0         0
172           0  ...         0         0         0         0         0
183           0  ...         0         0         0         0         0
194           0  ...         0         0         0         0         0
20...       ...  ...       ...       ...       ...       ...       ...
2141995       0  ...         0         0         0         0         0
2241996       0  ...         0         0         0         0         0
2341997       0  ...         0         0         0         0         0
2441998       0  ...         0         0         0         0         0
2541999       0  ...         0         0         0         0         0
26
27       pixel779  pixel780  pixel781  pixel782  pixel783
280             0         0         0         0         0
291             0         0         0         0         0
302             0         0         0         0         0
313             0         0         0         0         0
324             0         0         0         0         0
33...         ...       ...       ...       ...       ...
3441995         0         0         0         0         0
3541996         0         0         0         0         0
3641997         0         0         0         0         0
3741998         0         0         0         0         0
3841999         0         0         0         0         0
39
40[42000 rows x 784 columns]

数据取出来了,删除原变量清理空间

1del train_data

处理label为训练需要的,由于输出为10个数字,所以转成[0, 1, …, 0]的形式,比较好输出 使用tensorflow的keras中的to_categorical函数进行转化

1import tensorflow as tf
2print(Y_train)
3Y_train = tf.keras.utils.to_categorical(Y_train, num_classes = 10)
4print(Y_train)
 10        1
 21        0
 32        1
 43        4
 54        0
 6        ..
 741995    0
 841996    1
 941997    7
1041998    6
1141999    9
12Name: label, Length: 42000, dtype: int64
13[[0. 1. 0. ... 0. 0. 0.]
14 [1. 0. 0. ... 0. 0. 0.]
15 [0. 1. 0. ... 0. 0. 0.]
16 ...
17 [0. 0. 0. ... 1. 0. 0.]
18 [0. 0. 0. ... 0. 0. 0.]
19 [0. 0. 0. ... 0. 0. 1.]]

2. 搭建神经网络 #

1.1. 全连接神经网络 #

第一步先用最简单的刚学会的全连接神经网络进行搭建 先初始化tpu

1# detect and init the TPU
2tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
3tf.config.experimental_connect_to_cluster(tpu)
4tf.tpu.experimental.initialize_tpu_system(tpu)
5
6# instantiate a distribution strategy
7tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

全连接神经网络使用tensorflow上层封装的keras很方便的搭建

  • 由于数据为784像素,搭建一个$784 \times 300 \times 10$的三层全连接神经网络
  • 激活函数用sigmoid函数,输出层不使用激活函数
  • 训练算法为随机梯度下降算法SDG
  • 损失函数使用最常见的方差,也就是MSE函数
  • 评估函数,就是每次训练给自己看的正确率,使用accuracy
  • 没有让定学习率,是keras自定义了一个学习率,想改可以更改,仿照下面示例
1lr = tf.keras.get_value(model.optimizer.lr)
2tf.keras.set_value(model.optimizer.lr, lr * 0.1)

搭建神经网络

 1# 用tpu进行编译
 2with tpu_strategy.scope():
 3    model = tf.keras.Sequential([
 4    # Adds a densely-connected layer with 784 units to the model:
 5    tf.keras.layers.Dense(784, activation='sigmoid', input_shape=(784,)),
 6    # Add another:
 7    tf.keras.layers.Dense(300, activation='sigmoid'),
 8    # Add an output layer with 10 output units:
 9    tf.keras.layers.Dense(10)])
10
11    model.compile(optimizer='sgd',
12                  loss='mse',
13                  metrics=['accuracy'])

开始训练,训练10轮,不使用batch_size,也就是一个数据训练一次,使用就是多少个数据一起计算损失进行训练 上面的评价函数应该用accuracy,不过我写成了mse,和loss一样了,不过可以看出损失在下降

1model.fit(X_train, Y_train, epochs=10)
 1Train on 42000 samples
 2Epoch 1/10
 342000/42000 [==============================] - 19s 445us/sample - loss: 0.0662 - mse: 0.0662
 4Epoch 2/10
 542000/42000 [==============================] - 15s 358us/sample - loss: 0.0450 - mse: 0.0450
 6Epoch 3/10
 742000/42000 [==============================] - 15s 363us/sample - loss: 0.0389 - mse: 0.0389
 8Epoch 4/10
 942000/42000 [==============================] - 15s 361us/sample - loss: 0.0352 - mse: 0.0352
10Epoch 5/10
1142000/42000 [==============================] - 15s 368us/sample - loss: 0.0327 - mse: 0.0327
12Epoch 6/10
1342000/42000 [==============================] - 15s 365us/sample - loss: 0.0307 - mse: 0.0307
14Epoch 7/10
1542000/42000 [==============================] - 15s 360us/sample - loss: 0.0290 - mse: 0.0290
16Epoch 8/10
1742000/42000 [==============================] - 15s 361us/sample - loss: 0.0277 - mse: 0.0277
18Epoch 9/10
1942000/42000 [==============================] - 16s 382us/sample - loss: 0.0265 - mse: 0.0265
20Epoch 10/10
2142000/42000 [==============================] - 15s 360us/sample - loss: 0.0255 - mse: 0.0255

预测正确率为91.085%,很开心,初步使用神经网络完成

几个参数修改对比

  • 输出层加上sigmoid激活函数,正确率降低到81.285%,猜测限制了发挥
  • 输出层使用relu函数,正确率比sigmoid函数高一点,到达85.014%,应该同样限制了发挥吧
  • 所有层使用relu函数,预测结果直接有问题,relu函数导致结果全部为0,所以无法正常训练
    • 根据网上查到的信息,主要原因是输入没有做归一化,权值初始化有问题,训练过程出现权值过大或者过小,通过relu函数变成0,训练过程权值无法调整到合适的值导致无法正常训练
  • 隐藏层改为100个神经元,预测结果和300差别不大,都是91.000%
  • 神经网络改为$784 \times 150 \times 150 \times 10$,正确率达到86.257%,暂时未测试是否是训练轮数不够导致影响
    • sigmoid函数作为激活函数,在反向传播算法中,传播越远,梯度下降越难。由于传播使用的前一层的导数乘积,sigmoid函数导数最大为$\frac{1}{4}$,所以会下降更难,一般全连接神经网络只有三层

1.2. 卷积神经网络 #

 1# 用tpu进行编译
 2with tpu_strategy.scope():
 3    # Set the CNN model
 4    # my CNN architechture is In -> [[Conv2D->relu]*2 -> MaxPool2D -> Dropout]*2 -> Flatten -> Dense -> Dropout -> Out
 5    model = tf.keras.Sequential()
 6    model.add(tf.keras.layers.Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same',
 7                     activation ='relu', input_shape = (28,28,1)))
 8    model.add(tf.keras.layers.Conv2D(filters = 32, kernel_size = (5,5),padding = 'Same',
 9                     activation ='relu'))
10    model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
11    model.add(tf.keras.layers.Dropout(0.25))
12
13
14    model.add(tf.keras.layers.Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same',
15                     activation ='relu'))
16    model.add(tf.keras.layers.Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same',
17                     activation ='relu'))
18    model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2)))
19    model.add(tf.keras.layers.Dropout(0.25))
20
21
22    model.add(tf.keras.layers.Flatten())
23    model.add(tf.keras.layers.Dense(256, activation = "relu"))
24    model.add(tf.keras.layers.Dropout(0.5))
25    model.add(tf.keras.layers.Dense(10))
26
27    # Define the optimizer
28    optimizer = tf.keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
29    # Compile the model
30    model.compile(optimizer = optimizer , loss = "mse", metrics=["accuracy"])
31
32    model.fit(X_train, Y_train, epochs=9, batch_size=42)
 1Train on 42000 samples
 2Epoch 1/9
 342000/42000 [==============================] - 9s 225us/sample - loss: 0.6169 - accuracy: 0.8257
 4Epoch 2/9
 542000/42000 [==============================] - 5s 126us/sample - loss: 0.0145 - accuracy: 0.9686
 6Epoch 3/9
 742000/42000 [==============================] - 5s 131us/sample - loss: 0.0123 - accuracy: 0.9736
 8Epoch 4/9
 942000/42000 [==============================] - 5s 126us/sample - loss: 0.0113 - accuracy: 0.9763
10Epoch 5/9
1142000/42000 [==============================] - 5s 125us/sample - loss: 0.0110 - accuracy: 0.9772
12Epoch 6/9
1342000/42000 [==============================] - 6s 132us/sample - loss: 0.0107 - accuracy: 0.9774
14Epoch 7/9
1542000/42000 [==============================] - 6s 135us/sample - loss: 0.0105 - accuracy: 0.9775
16Epoch 8/9
1742000/42000 [==============================] - 6s 132us/sample - loss: 0.0102 - accuracy: 0.9787
18Epoch 9/9
1942000/42000 [==============================] - 5s 125us/sample - loss: 0.0102 - accuracy: 0.9774

3. 结果预测 #

  • 对测试数据进行读取和预测
1X_test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
2X_test = X_test.values.reshape(-1,28,28,1)
3result = model.predict(X_test)
4print(result)
[[-6.0319379e-03  2.2311732e-03  9.7684860e-01 ... -5.4722652e-03
   5.9211254e-04  4.1266829e-03]
 [ 1.0062367e+00 -2.2030249e-03 -5.0238818e-03 ... -3.2700002e-03
  -9.3276799e-04 -3.5367012e-03]
 [ 2.6999190e-03  6.4259917e-03  1.0489762e-02 ... -1.1652485e-03
   6.1268814e-02  9.1225845e-01]
 ...
 [-5.9012100e-03 -9.0321898e-04  1.8455610e-03 ...  2.4292246e-03
  -3.2179952e-03 -1.5886426e-03]
 [-7.6884702e-03 -5.6109652e-03 -2.5104508e-03 ... -7.8360438e-03
  -1.0629505e-02  1.0693249e+00]
 [-2.6000291e-03  5.1201209e-03  9.4756949e-01 ... -3.5209060e-03
   5.4134727e-03  6.2924922e-03]]
  • 这数据肯定不能用,用argmax转成我们需要的结果
1tmp =  np.argmax(result, axis=1)
2print(tmp)
[2 0 9 ... 3 9 2]
  • 存到csv,提交
1result_data = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')
2result_data['Label'] = tmp
3result_data.to_csv('/kaggle/working/sample_submission.csv', index = 0)