吴恩达机器学习第一次编程作业-线性回归

2019-07-11

机器学习

课时45 编程作业：线性回归

原文档使用Octave/MATLAB作为官方教程，这里使用python实现线性回归

资料：单变量线形回归-笔记、多变量线性回归-笔记

工具：pycharm2018.3.3

第一节引言

第二节单变量线性回归

完整代码1
完整代码2
参考1

2.0 导入包

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
matplotlib.use('TkAgg')

2.1 Plotting the Data 绘制数据图

def get_data():   # 读取数据，返回两个向量
	population_profit = np.genfromtxt("exercise/ex1/ex1data1.txt", delimiter=',', dtype=float)
	# print(population_profit.shape)
	population = np.array(population_profit[:, 0])   # 以向量形式获取第一列
	profit = np.array(population_profit[:, 1])  # 同上
	return population,profit    # 以向量形式返回横轴数据和纵轴数据

def draw_picture(x,y):  # 传入横轴和对应纵轴数据
	population,profit=x,y
	plt.xlim(4,24)    # plt.xlim(4,24,2) 表示以2为刻度单位显示横轴坐标范围为[4,24]
	plt.ylim(-5,25)
	plt.plot(population,profit,'xr')  # ‘x’为样式，‘r’为颜色，可自主调整
	plt.xlabel('Population of City in 10,000s')  # 添加横轴标签
	plt.ylabel('Profit in $10,000s')
	plt.title('one variable for the profits',fontsize=20)   # 设置图片标题
	plt.show()

2.2 Gradient Descent 梯度下降

概念：单变量线形回归
需要注意的是θ需要同时更新 - batch gradient descent algorith 批量梯度下降算法
核心在于计算代价函数CostFunction和theta
用python实现时最好把进行运算的对象都转化为相同类型
关于梯度下降同时求θ操作理解补充，请看图：梯度下降.png，这样就解释了为什么代码中每个训练样本x都要与hypothesis相乘再相加，

def computer_cost(x,y,theta,m):   # 计算平方误差代价函数,x，y为横轴纵轴数据，m为训练样本数
	# print(x.shape,theta.shape,y.shape)
	hypothesis = np.dot(x, theta)   # 假设函数
	# print(hypothesis.shape,y.shape)
	# print((hypothesis-y).shape)
	return  np.sum(((hypothesis-y)**2))/(2.0*m)   # 代价函数 J

def Gradient_Descent(x,y,m,theta,alpha,iterations):
	cost=np.zeros(iterations+1)  # 将初始值也算进去然后迭代iterations次
	cost[0]=computer_cost(x,y,theta,m)  # 代价函数初始值
	each_theta=np.zeros((2,iterations+1))
	each_theta[0][0],each_theta[1][0]=theta[0][0],theta[1][0]
	for iteration in range(1,iterations+1):
		theta=theta-(alpha/m)*(x.T.dot((x.dot(theta)-y)))
		cost[iteration]=computer_cost(x,y,theta,m)  # 同步记录每一个值，然后作3D图或等高线
		each_theta[0][iteration], each_theta[1][iteration] = theta[0][0], theta[1][0]
		# print('第%d次迭代，' %iteration,'代价函数值为：',cost[iteration],'此时theta为：',theta.ravel())
	return cost,theta,each_theta   # 返回所有的代价函数和参数

设置好初始θ、α、迭代次数iterations后调用即可，接口如下：

def One_Variable_Linear_Regression(theta,alpha,iterations):
	x, y = get_data()  # 获取数据
	m = x.size   # 训练样本数
	temp = np.ones((m, 1))  # 生成一列1，作为X0
	x = x.reshape(m, 1)  # 将向量变化成矩阵
	y = y.reshape(m, 1)  # 为了避免出错，务必保持参与运算的所有对象类型一致，务必务必
	x = np.hstack((temp, x))  # 制造X0为1
	# print('初始代价函数值为：',computer_cost(x,y,theta,m))
	cost,final_theta,each_theta=Gradient_Descent(x,y,m,theta,alpha,iterations)
	predict_profit(final_theta)   # 根据最终的theta预测利润，代码见下
	Adapting(x,y,final_theta)   # 进行拟合，代码见下

得到最终的theta后就可以进行预测了和拟合了

def predict_profit(theta):
	# 绘制出cost函数图找到中心对应的theta即我们要找的参数
	# 用得出的参数来计算hypothesis假设函数即预测，再绘制出来观察是否和源数据拟合
	print(theta)
	predict1=(np.array([1,3.5])*theta).sum()
	predict2=(np.array([1,7])*theta).sum()
	print('当人口数为3.5W时，利润为：',predict1*10000)
	print('当人口数为7w时，利润为：',predict2*10000)

def Adapting(x,y,theta): # 拟合
	hypothesis=x.dot(theta)    # 预测值
	plt.xticks(np.arange(4,24,2))  # 设置横轴坐标范围和刻度
	plt.yticks(np.arange(-5,25,5))
	plt.xlabel('Population of City in 10,000s',fontsize=7)   # 设置横轴标签
	plt.ylabel('Profit in $10,000s',fontsize=7)
	plt.title('Training data with linear regression fit',fontsize=15)
	plt.plot((x.T)[1],y,'xr',label='Training data')
	plt.plot((x.T)[1],hypothesis,label='Linear regression')
	plt.legend(loc='lower right',fontsize=8)  # 线条含义说明
	plt.show()

2.4 根据J(θ)绘制等高线和三维图

不会
占坑

关于单变量线性回归的理解

含义：一个模型而已，目的是用来预测的。单变量指只有一个特征输入
用图片来讲，是让得出的预测模型跑出的数据和源数据更好的拟合如上图所示
那么为了这个目的提出了代价函数J（也叫损失函数）并给出了计算公式，表示模型和源数据的偏差程度，当然越小越好
这个计算公式是根据梯度下降算法得出的，根据偏导数更新。关于梯度下降有一篇比较容易理解的文章：参考，里面说明了为什么梯度下降能使θ和代价函数最小化
梯度下降这里使用的是批量梯度下降，还有其他的方法，比如随机梯度下降以及….忘了，它有一个缺陷好像是只能达到局部最优，这个比较好理解，可能存在那么一个低谷，它的偏导数正好是0，那么就走不动了。
关于学习率：以前笔记写的是梯度下降每一步走的步长，这里说每一步是指一次迭代相当于走了一步，正常它是往下走的，为什么往下走上面那篇参考也给出了答案：在右边导数为正，那么减去正数，而在右边时导数为负，看似加上一个值，但初始θ设置的是0啊，所以是一个负数加上一个正数，还是往低谷走
为什么要绘制图形，等高线和三维图可以直观地反映出梯度下降的过程，所以作图能力Emmmmm
关于工具的使用，这个真的要熟悉语言特性，不然碰到很多很多问题

第三节 Linear regression with multiple variables多变量线性回归

完整代码1

3.1 特征缩放-标准化（Normalization）

关于特征缩放：特征缩放
sklearn 库中有快速实现标准化的方法，但….窝不会这个库啊，所以还是用笨方法实现

特征缩放方法很多，这里是减去平均值再除以标准差

def Feature_Normalization(x,m):  # 特征缩放，减去平均值再除以标准差
	mean_value=x.sum(axis=0)/m  # 按列求和再取平均
	std=math.sqrt(((x-mean_value)**2).sum()/m)  # 求标准差
	x=(x-mean_value)/std   # 特征缩放
	return x

3.2 梯度下降

和单变量一样，只不过维数增加了，运算完全一样

def Computer_Cost(x,y,theta,m):
	hypothesis=x.dot(theta) # 假设函数
	return np.sum((hypothesis-y)**2)/2/m   # 代价函数

def Gradient_Descent(x,y,theta,m,iterations,alpha):
	cost=np.zeros(iterations+1)
	cost[0]=Computer_Cost(x,y,theta,m)
	for iteration in range(1,iterations+1):
		theta=theta-np.dot(x.T,(alpha/m)*(x.dot(theta)-y))
		cost[iteration]=Computer_Cost(x,y,theta,m)
		# print('第%d次迭代，'%iteration,'代价函数为：',cost[iteration],'此时theta为：',theta.ravel())
	return cost,theta

在函数中调用接口即可

def Draw_Iterations_Cost(iterations,y):  # 根据学习率绘制代价函数随迭代次数的变化
	x=np.linspace(0,iterations,iterations)
	plt.xlim(0,50)
	plt.ylim(0,7e10)
	plt.plot(x,y,'y')
	plt.xlabel('Number of iterations',fontsize=7)
	plt.ylabel('Cost J',fontsize=7)
	plt.title('Convergence of gradient descent',fontsize=13)
	plt.show()
	
def Multiple_Variables_Linear_Regression(x,y,theta,m,iterations,alpha):
	cost,final_theta=Gradient_Descent(x,y,theta,m,iterations,alpha)
	Draw_Iterations_Cost(iterations+1,cost)  # 绘制代价函数收敛曲线

获取数据与初始化

def Get_Data():
	temp_data=np.genfromtxt('exercise/ex1/ex1data2.txt',delimiter=',',dtype=float)
	X=temp_data[:,0:2].reshape(temp_data.shape[0],temp_data.shape[1]-1)  # 多特征
	Y=temp_data[:,2].reshape(temp_data.shape[0],1)   # 输出只有一列
	# print(X.shape,Y.shape)
	return X,Y,X.shape[0]      # 返回X特征值，Y为输出，X.size为训练样本数

x,y,m=Get_Data()
x=Feature_Normalization(x,m)  # 特征缩放
x=np.hstack((np.ones((m,1)),x))  # 添加 X0 = 1
theta=np.zeros((x.shape[1],1))   # 设置初始theta
iterations=100
alpha=0.1  # 学习率 可尝试0.3，0.1，0.03，0.01，绘制出来的收敛曲线是不同的
Multiple_Variables_Linear_Regression(x,y,theta,m,iterations,alpha)

选取α=0.1绘制出来的收敛曲线如下：

TODO 绘制数据三维图与等高线