PRELOADER

当前文章 : 《梯度下降求解逻辑回归2(代码编写以及三种梯度下降对比)》

5/10/2019 —— 

梯度下降求解逻辑回归2(代码编写以及三种梯度下降对比)

上一篇是理论知识、背景介绍以及大体的实现方向,这一篇是具体代码实现

功能模块准备

  1. 写出sigmoid函数,返回被录取的概率,即映射到概率

  1. 写出model函数,返回预测结果值,即X(样本值)与theta的矩阵相乘结果

    1. def sigmoid(z):
    2. return 1 / (1 + np.exp(-z))
    3. nums = np.arange(-10, 10 ,step=1)
    4. fig, ax = plt.subplots(figsize=(12,4))
    5. ax.plot(nums, sigmoid(nums), ‘r’)

写出model函数,返回预测结果值,即X(样本值)与theta的矩阵相乘结果

1. def model(X, theta):
1. 
1. return sigmoid(np. dot(X, theta.T))
1. pdData.insert(0, 'Ones', 1)
1. 
1. orig_data = pdData.as_matrix()
1. cols = orig_data.shape[1]
1. x = orig_data[:,0:cols-1]
1. y = orig_data[:,cols-1:cols]
1. 
1. theta = np.zeros([1, 3])

写出cost函数,实现计算损失函数,目的是度量预测值与真实值的拟合程度

将对数似然函数去负号

1. def cost(x, y, theta):
1. left = np.multiply(-y, np.log(model(x, theta)))
1. right = np.multiply(1 - y, np.log(1 - model(x,theta)))
1. return np.sum(left - right) / (len(x))
1. cost(x, y, theta)

写出gradient函数,来计算每个样本的梯度方向,即学习方向,梯度下降的方向

1. def gradient(x, y, theta):
1. grad = np.zeros(theta.shape)
1. error = (model(x, theta) - y).ravel()
1. for j in range(len(theta.ravel())):
1. term = np.multiply(error, x[:,j])
1. grad[0, j] = np.sum(term) / len(x)
1. return grad

设置停止策略模式

descent函数,实现参数 θ 的更新

accuracy函数,计算精度

Gradient descent

比较3种不同梯度下降方法

1. STOP_ITER = 0
1. STOP_COST = 1
1. STOP_GRAD = 2
1. 
1. def stopCriterion(type, value, threshold):
1. 
1. if type == STOP_ITER: return value > threshold
1. elif type == STOP_COST: return abs(value[-1]-value[-2]) < threshold
1. elif type == STOP_GRAD: return np.linalg.norm(value) < threshold

1. import numpy.random
1. 
1. def shuffleData(data):
1. np.random.shuffle(data)
1. cols = data.shape[1]
1. x = data[:, 0:cols-1]
1. y = data[:, cols-1:]
1. return x, y

1. import time
1. 
1. def descent(data, theta, batchSize, stopType, thresh, alpha):
1. #梯度下降求解
1. 
1. init_time = time.time()
1. i = 0 # 迭代次数
1. k = 0 # batch
1. x, y = shuffleData(data)
1. grad = np.zeros(theta.shape) # 计算的梯度
1. costs = [cost(x, y, theta)] # 损失值
1. 
1. 
1. while True:
1. grad = gradient(x[k:k+batchSize], y[k:k+batchSize], theta)
1. k += batchSize #取batch数量个数据
1. if k >= n: 
1. k = 0 
1. x, y = shuffleData(data) #重新洗牌
1. theta = theta - alpha*grad # 参数更新
1. costs.append(cost(x, y, theta)) # 计算新的损失
1. i += 1 
1. 
1. if stopType == STOP_ITER:   value = i
1. elif stopType == STOP_COST: value = costs
1. elif stopType == STOP_GRAD: value = grad
1. if stopCriterion(stopType, value, thresh): break
1. 
1. return theta, i-1, costs, grad, time.time() - init_time

1. def runExpe(data, theta, batchSize, stopType, thresh, alpha):
1. #import pdb; pdb.set_trace();
1. theta, iter, costs, grad, dur = descent(data, theta, batchSize, stopType, thresh, alpha)
1. name = "Original" if (data[:,1]>2).sum() 1 else "Scaled"
1. name += " data - learning rate: {} - ".format(alpha)
1. if batchSize==n: strDescType = "Gradient"
1. elif batchSize==1:  strDescType = "Stochastic"
1. else: strDescType = "Mini-batch ({})".format(batchSize)
1. name += strDescType + " descent - Stop: "
1. if stopType == STOP_ITER: strStop = "{} iterations".format(thresh)
1. elif stopType == STOP_COST: strStop = "costs change < {}".format(thresh)
1. else: strStop = "gradient norm < {}".format(thresh)
1. name += strStop
1. print ("***{}\nTheta: {} - Iter: {} - Last cost: {:03.2f} - Duration: {:03.2f}s".format(
1. name, theta, iter, costs[-1], dur))
1. fig, ax = plt.subplots(figsize=(12,4))
1. ax.plot(np.arange(len(costs)), costs, 'r')
1. ax.set_xlabel('Iterations')
1. ax.set_ylabel('Cost')
1. ax.set_title(name.upper() + ' - Error vs. Iteration')
1. return theta

不同的停止策略

设定迭代次数

n = 100
runExpe(orig_data, theta, n, STOP_ITER, thresh=5000, alpha=0.000001)

***Original data - learning rate: 1e-06 - Gradient descent - Stop: 5000 iterations

Theta: [[-0.00027127 0.00705232 0.00376711]] - Iter: 5000 - Last cost: 0.63 - Duration: 0.89s

array([[-0.00027127, 0.00705232, 0.00376711]])

根据损失停止

设置阈值1E-6,差不多需要110000次迭代

runExpe(orig_data,theta, n, STOP_COST, thresh=0.000001, alpha=0.001)

***Original data - learning rate: 0.001 - Gradient descent - Stop: costs change < 1e-06

Theta: [[-5.13364014 0.04771429 0.04072397]] - Iter: 109901 - Last cost: 0.38 - Duration: 19.71s

array([[-5.13364014, 0.04771429, 0.04072397]])

根据梯度变化停止

设置阈值0.05,差不多需要40000次迭代

runExpe(orig_data,theta, n, STOP_GRAD, thresh=0.05, alpha=0.001)

***Original data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.05

Theta: [[-2.37033409 0.02721692 0.01899456]] - Iter: 40045 - Last cost: 0.49 - Duration: 7.64s

array([[-2.37033409, 0.02721692, 0.01899456]])

对比不同的梯度下降方法

Stochastic descent

runExpe(orig_data,theta, 1, STOP_ITER, thresh=5000, alpha=0.001)

***Original data - learning rate: 0.001 - Stochastic descent - Stop: 5000 iterations

Theta: [[-0.38738789 0.01884999 -0.02619728]] - Iter: 5000 - Last cost: 0.93 - Duration: 0.27s

array([[-0.38738789, 0.01884999, -0.02619728]])

有点爆炸。。。很不稳定,再来试试把学习率调小一些

runExpe(orig_data,theta, 1, STOP_ITER, thresh=15000, alpha=0.000002)

***Original data - learning rate: 2e-06 - Stochastic descent - Stop: 15000 iterations

Theta: [[-0.00202344 0.00988736 0.00084222]] - Iter: 15000 - Last cost: 0.63 - Duration: 0.77s

array([[-0.00202344, 0.00988736, 0.00084222]])

速度快,但稳定性差,需要很小的学习率

runExpe(orig_data,theta, 16, STOP_ITER, thresh=15000, alpha=0.001)

***Original data - learning rate: 0.001 - Mini-batch (16) descent - Stop: 15000 iterations

Theta: [[-1.03209164 0.01062196 0.00754394]] - Iter: 15000 - Last cost: 0.59 - Duration: 1.06s

array([[-1.03209164, 0.01062196, 0.00754394]])

from sklearn import preprocessing as pp
scaled_data = orig_data.copy()

scaled_data[:, 1:3] = pp.scale(orig_data[:, 1:3])

runExpe(scaled_data, theta, n, STOP_ITER, thresh=5000, alpha=0.001)

***Scaled data - learning rate: 0.001 - Gradient descent - Stop: 5000 iterations

Theta: [[0.3080807 0.86494967 0.77367651]] - Iter: 5000 - Last cost: 0.38 - Duration: 0.96s

array([[0.3080807 , 0.86494967, 0.77367651]])

它好多了!原始数据,只能达到达到0.61,而我们得到了0.38个在这里!

所以对数据做预处理是非常重要的

runExpe(scaled_data, theta, n, STOP_GRAD, thresh=0.02, alpha=0.001)

***Scaled data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.02

Theta: [[1.0707921 2.63030842 2.41079787]] - Iter: 59422 - Last cost: 0.22 - Duration: 12.57s

array([[1.0707921 , 2.63030842, 2.41079787]])

更多的迭代次数会使得损失下降的更多!

theta = runExpe(scaled_data, theta, 1, STOP_GRAD, thresh=0.002/5, alpha=0.001)

***Scaled data - learning rate: 0.001 - Stochastic descent - Stop: gradient norm < 0.0004

Theta: [[1.14910956 2.78978546 2.56898948]] - Iter: 72633 - Last cost: 0.22 - Duration: 5.24s

runExpe(scaled_data, theta, 16, STOP_GRAD, thresh=0.002*2, alpha=0.001)

***Scaled data - learning rate: 0.001 - Mini-batch (16) descent - Stop: gradient norm < 0.004

Theta: [[1.15788009 2.80684471 2.58152883]] - Iter: 1392 - Last cost: 0.22 - Duration: 0.17s

array([[1.15788009, 2.80684471, 2.58152883]])

精度

设定阈值

1. def predict(x, theta):
1. 
1. return [1 if x>=0.5 else 0 for x in model(x, theta)]
1. 
1. scaled_X = scaled_data[:, :3]
1. 
1. y = scaled_data[:, 3]
1. 
1. predictions = predict(scaled_X, theta)
1. 
1. correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(predictions, y)]
1. 
1. accuracy = (sum(map(int, correct)) % len(correct))
1. 
1. print ('accuracy = {0}%'.format(accuracy))

accuracy = 89%