10.1 深度神经网络架构【stanford-cs329p】

2022-06-27 stanford-cs329p 0 0 Comments Word Count: 534(words) Read Count: 2(minutes)

Deep Network Tuning

DL是一种很有效的编程语言，能够很好的理解数据
- 一些值可以在后续真实数据填充
- 可微
各种设计模式，从层到网络架构

Batch Normalization

将数据标准化使损失函数更加平滑特别是线性模型，不然可能梯度突然很大
BN把中间一些输入也做了标准化帮助训练更容易，更平滑
把输入变成2D（如果本来就是2D那就不变）
- $X\in R^{n\times c\times w \times h}\rightarrow X’\in R^{nwh\times c}$
对每一列标准归一化
- $\hat{x}‘_j\leftarrow (\hat{x}’_j -mean(x’_j))/std(x’_j)$
从 $\hat{y}‘_j=\gamma_j\hat{x}’_j+\beta_j$ 还原得到 $Y’$
$Y’$ 转回以前的 $Y$

Code

def bacth_norm(X, gamma, beta, moving_mean, moving_var, eps, momentum):
    if not torch.is_grad_enabled(): # in prediction mode
        X_hat = (X - moving_mean) / torch.sqrt(moving_var + eps)
    else:
        assert len(X.shape) in (2, 4)
        if len(X.shape) == 2:
            mean = X.mean(dim=0)
            var = ((X - mean)**2).mean(dim=0)
        else:
            mean = X.mean(dim=(0, 2, 3), keepdim=True)
            var = ((X - mean)**2).mean(dim=(0, 2, 3), keepdim=True)
        X_hat = (X - mean) / torch.sqrt(var + eps)
        moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
        moving_var = momentum * moing_var + (1.0 - momentum) * var
    Y = gamma * X_hat + beta
    return Y, moving_mean, moving_var

Layer Normalization

如果应用在RNN里面，BN在每个时间步都需要维护均值方差等，这个不能共享
- 假设训练时句子是10，预测是20，这下好了静态数值都不够了
LN做了个转置，把输入 $X\in R^{n\times p}\rightarrow X’\in R^{p\times n}$ ，$X\in R^{n\times c\times w \times h}\rightarrow X’\in R^{cwh\times n}$ ，剩下的和BN一样
- 区别就是在哪个维度做均值和方差，LN是在样本里面操作
- 训练和推理时是一致的
- 在Transformer中很流行

More Normalization

修改“reshape”
- InstanceNorm：$n\times c\times w\times h\rightarrow wh\times cn$
- GroupNorm：$n\times c\times w\times h\rightarrow swh\times gn$ 当 $c=sg$
- CrossNorm：给一堆特征后交换均值/方差
修改“normalize”：白化
修改”recovery“：修改 $\gamma,\beta$
应用到权重或梯度

Summary

把中间层数值更稳定使训练更容易
归一化技术主要三步：输入重置，归一化数据，还原步骤

本文链接： https://tyang816.github.io/2022/06/27/10.1 深度神经网络架构/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

Yang Tan

Master Student @ECUST

10.1 深度神经网络架构【stanford-cs329p】

Deep Network Tuning

Batch Normalization

Code

Layer Normalization

More Normalization

Summary

Yang TanMaster Student @ECUST