本来是打算跳过这个作业的,结果发现后面的作业 4 self-attention 和本次作业相关性非常大。为了保证后边作业的顺利完成,加深对这个数据集的理解,所以继续将改作业做完。
以下是本次作业的实验记录。
简要说明
经过了实作 2020 和 2021 课程的一些基础作业,自己也对 AI 的一个基本训练流程和调优过程有了一定的理解。
深度学习的代码结构一般分为三个模块:数据、模型、训练,三个部分。从而调优也是从与之对应的这三个部分入手:特征工程;模型的结构修改;超参数优化器调整,三个部分。
本次作业的数据部分留出的调整空间非常小,而且需要一定的语言学知识,秉着给自己减负的原则,就直接调整后两个模块,不再做特征工程的处理😘。
如果想做特征工程 ppt 给出了数据处理思路:取中心词前后一段的 phoneme 作为特征训练。
作业 ppt 说明
kaggle 地址
我的作业仓库
实验过程
范例代码结果
- kaggle 0.686
- acc 0.702
- loss 0.94
范例使用 3 层的全连接线性层,经过了 20 轮训练,优化器用了 Adam,具体参数见如下代码块。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
# 参数
config = {
'val_ratio': 0.2,
'batch_size': 64,
'device': 'cuda',
'n_epoch': 20,
'learning_rate': 0.0001,
'model_path': './model.ckpt',
}
# 模型结构
class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.layer1 = nn.Linear(429, 1024)
self.layer2 = nn.Linear(1024, 512)
self.layer3 = nn.Linear(512, 128)
self.out = nn.Linear(128, 39)
self.act_fn = nn.Sigmoid()
def forward(self, x):
x = self.layer1(x)
x = self.act_fn(x)
x = self.layer2(x)
x = self.act_fn(x)
x = self.layer3(x)
x = self.act_fn(x)
x = self.out(x)
return x
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
|
只修改参数效果不明显
调整了训练集和测试的比例,因为本次作业数据量很大超过100w的训练数据,因此取出0.05作为验证集。
batch size 修改为128,轮次改为30。
调整模型结构
- kaggle 0.74332
- acc 0.751
- 验证loss 0.775
这回的优化思路是加深模型,由于数据给的足够多的,只有三层的模型的确没有能力处理本问题,因此加深模型是个十分不错的解决方案。
在范例模型基础上增加了三层,并加上 dropout 和 batchnormal 层,激活函数修改为 ReLu。
epoch没有修改还是30,但是训练结束后看到 acc 和 loss 还有下降的趋势之后可以增加epoch轮次。
实验结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
|
[001/030] Train Acc: 0.474353 Loss: 1.842455 | Val Acc: 0.620973 loss: 1.260234
saving model with acc 0.621
[002/030] Train Acc: 0.577377 Loss: 1.443641 | Val Acc: 0.664699 loss: 1.103806
saving model with acc 0.665
[003/030] Train Acc: 0.606046 Loss: 1.336517 | Val Acc: 0.684521 loss: 1.033905
saving model with acc 0.685
[004/030] Train Acc: 0.623466 Loss: 1.272803 | Val Acc: 0.694814 loss: 0.989407
saving model with acc 0.695
[005/030] Train Acc: 0.635239 Loss: 1.228336 | Val Acc: 0.702099 loss: 0.957638
saving model with acc 0.702
[006/030] Train Acc: 0.645314 Loss: 1.193012 | Val Acc: 0.706262 loss: 0.937179
saving model with acc 0.706
[007/030] Train Acc: 0.652792 Loss: 1.164505 | Val Acc: 0.712051 loss: 0.915804
saving model with acc 0.712
[008/030] Train Acc: 0.659561 Loss: 1.139586 | Val Acc: 0.716767 loss: 0.899467
saving model with acc 0.717
[009/030] Train Acc: 0.664224 Loss: 1.120646 | Val Acc: 0.718734 loss: 0.882219
saving model with acc 0.719
[010/030] Train Acc: 0.668926 Loss: 1.101871 | Val Acc: 0.720588 loss: 0.873744
saving model with acc 0.721
[011/030] Train Acc: 0.674056 Loss: 1.085537 | Val Acc: 0.725824 loss: 0.858666
saving model with acc 0.726
[012/030] Train Acc: 0.678045 Loss: 1.070081 | Val Acc: 0.726491 loss: 0.855108
saving model with acc 0.726
[013/030] Train Acc: 0.681587 Loss: 1.055746 | Val Acc: 0.729564 loss: 0.843991
saving model with acc 0.730
[014/030] Train Acc: 0.685095 Loss: 1.043804 | Val Acc: 0.731223 loss: 0.835380
saving model with acc 0.731
[015/030] Train Acc: 0.688495 Loss: 1.031930 | Val Acc: 0.734735 loss: 0.828403
saving model with acc 0.735
[016/030] Train Acc: 0.690669 Loss: 1.021308 | Val Acc: 0.734703 loss: 0.823074
[017/030] Train Acc: 0.693583 Loss: 1.011261 | Val Acc: 0.737272 loss: 0.816671
saving model with acc 0.737
[018/030] Train Acc: 0.696030 Loss: 1.001330 | Val Acc: 0.739727 loss: 0.809100
saving model with acc 0.740
[019/030] Train Acc: 0.698872 Loss: 0.992060 | Val Acc: 0.740036 loss: 0.810910
saving model with acc 0.740
[020/030] Train Acc: 0.700744 Loss: 0.983307 | Val Acc: 0.742947 loss: 0.802383
saving model with acc 0.743
[021/030] Train Acc: 0.702734 Loss: 0.975952 | Val Acc: 0.742329 loss: 0.797947
[022/030] Train Acc: 0.704886 Loss: 0.968463 | Val Acc: 0.743288 loss: 0.792694
saving model with acc 0.743
[023/030] Train Acc: 0.706776 Loss: 0.961081 | Val Acc: 0.745906 loss: 0.790614
saving model with acc 0.746
[024/030] Train Acc: 0.708947 Loss: 0.953915 | Val Acc: 0.747110 loss: 0.792937
saving model with acc 0.747
[025/030] Train Acc: 0.710140 Loss: 0.949135 | Val Acc: 0.746654 loss: 0.784579
[026/030] Train Acc: 0.711947 Loss: 0.940888 | Val Acc: 0.747728 loss: 0.783092
saving model with acc 0.748
[027/030] Train Acc: 0.713619 Loss: 0.936346 | Val Acc: 0.748703 loss: 0.781725
saving model with acc 0.749
[028/030] Train Acc: 0.714991 Loss: 0.929901 | Val Acc: 0.749874 loss: 0.776371
saving model with acc 0.750
[029/030] Train Acc: 0.716764 Loss: 0.924471 | Val Acc: 0.749646 loss: 0.775422
[030/030] Train Acc: 0.718027 Loss: 0.918822 | Val Acc: 0.751240 loss: 0.775216
saving model with acc 0.751
|
模型代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
|
class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.layer1 = nn.Linear(429, 2048)
self.layer2 = nn.Linear(2048, 2048)
self.layer3 = nn.Linear(2048, 1024)
self.layer4 = nn.Linear(1024, 1024)
self.layer5 = nn.Linear(1024, 512)
self.layer6 = nn.Linear(512, 128)
self.out = nn.Linear(128, 39)
self.act_fn = nn.ReLU()
self.drop = nn.Dropout(0.5)
self.bn1 = nn.BatchNorm1d(2048)
self.bn2 = nn.BatchNorm1d(2048)
self.bn3 = nn.BatchNorm1d(1024)
self.bn4 = nn.BatchNorm1d(1024)
self.bn5 = nn.BatchNorm1d(512)
self.bn6 = nn.BatchNorm1d(128)
def forward(self, x):
x = self.layer1(x)
x = self.bn1(x)
x = self.act_fn(x)
x = self.drop(x)
x = self.layer2(x)
x = self.bn2(x)
x = self.act_fn(x)
x = self.drop(x)
x = self.layer3(x)
x = self.bn3(x)
x = self.act_fn(x)
x = self.drop(x)
x = self.layer4(x)
x = self.bn4(x)
x = self.act_fn(x)
x = self.drop(x)
x = self.layer5(x)
x = self.bn5(x)
x = self.act_fn(x)
x = self.drop(x)
x = self.layer6(x)
x = self.bn6(x)
x = self.act_fn(x)
x = self.drop(x)
x = self.out(x)
return x
|
之后想要突破 strong baseline 可能需要继续加深模型,或使用 RNN 结构,添加之后课程的attention机制。看到排行榜上也有人添加 HMM(隐式马尔科夫模型)从而达到了strong baseline。
HMM 还没接触过,就不在继续深入研究,等后面有时间再实作一下。