scikit-learn: machine learning in Python

2019-07-24

机器学习

scikit-learn: machine learning in Python

工具：pycharm2019.1，python3.5

参考资料：scikit-learn: machine learning in Python，scikit-learn Tutorials

准备工作：

Introduction: problem settings

Data in scikit-learn

The data matrix，即数据矩阵。机器学习算法在scikit-learn中实现要求数据是二维数组或矩阵存储的。数组的大小要求是 [n_samples, n_features]，即行代表样本数，列代表特征数。
一个样本可以是一个文件，一张图片，一段音频等。
以线形回归模型为例，特征数即x₁，x₂…x_n，矩阵的列数，每一列代表一种特征。
注意，特征数量必须提前确定。

A Simple Example: the Iris Dataset

预测牵牛花种类。
花的特征有4个：sepal length (cm)， sepal width (cm)， petal length (cm) ，petal width (cm)
前提工作：导入包，没有则在pycharm解释器中安装sklearn
注意：scikit-learn被作为sklearn导入了；scikit-learn中内置了iris信息的CSV文件的复制文件，并单独设置了一个函数加载到numpy数组中。
导入数据：

from sklearn.datasets import load_iris
iris=load_iris()
print(iris.data.shape,type(iris.data))
print(iris.data[0])
- - - - - - - - - - - - - - - - - 
(150, 4) <class 'numpy.ndarray'>
[5.1 3.5 1.4 0.2]

由于这里特征数是4，故只选取前两个特征来绘制数据。

def plot_dataset(iris): # choose fisrt two features
    # this formatter will label the colorbar with the correct target names
    formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])
    plt.scatter(iris.data[:,0],iris.data[:,1],c=iris.target)
    plt.colorbar(ticks=[0, 1, 2], format=formatter)
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[1])
    plt.tight_layout() # Automatically adjust subplot parameters to give specified padding
    plt.show()

Basic principles of machine learning with scikit-learn

每个算法都是通过一个’估计值’的对象暴露在sckit-learn中，或者说与应用在scikit-learn中？
例子，，没看懂。。

Supervised Learning: Classification and regression

监督学习：分类和回归。