3.4 随机梯度下降【stanford-cs329p】

2022-05-12 stanford-cs329p 0 0 Comments Word Count: 304(words) Read Count: 1(minutes)

Mini-batch Stochastic gradient descent (SGD)

通过小批量SGD训练，当然还有很多其他方法
- 模型参数 $w$ ，批量大小 $b$ ，在 $t$ 时刻的学习率 $\eta_t$
- 随机初始化 $w_1$
- 重复 $t=1,2,…$ 直到收敛
  - 随机采样 $I_t\in {1,…,n}$ 且 $|I_t|=b$
  - 更新 $w_{t+1}=w_t-\eta_t\nabla_{w_t}\mathcal{L}(X_I,y_{I_t},w_t)$
好处：在这种情况下解决了除了决策树的所有目标
坏处：对超参数 $b$ 和 $\eta_t$ 敏感

Code

通过小批量的SGDD训练线性回归模型
超参数：批量大小，学习率，迭代数

# `features` shape is (n,p), `labels` shape is (p,1)
def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(indices[i:min(i+batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]

w = torch.normal(0, 0.01, size=(p,1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        y_hat = X @ w + b
        loss = ((y_hat - y)**2 / 2).mean()
        loss.backward()
        for parm in [w,b]:
            param -= learning_rate * param
            param.grad.zero_()

Summary

将线性方法线性加权输出获得预测
线性回归用MSE作为损失函数
Softmax 回归用于多分类问题
- 将预测用交叉熵变成概率
小批量SGD用于很多网络

本文链接： https://tyang816.github.io/2022/05/12/3.4 随机梯度下降/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

Yang Tan

Master Student @ECUST

3.4 随机梯度下降【stanford-cs329p】

Mini-batch Stochastic gradient descent (SGD)

Code

Summary

Yang TanMaster Student @ECUST