3.3.2 分类模型随机数据生成_机器学习：软件工程方法与实现-QQ阅读男生历史网

上QQ阅读APP看书，第一时间看更新

3.3.2　分类模型随机数据生成

make_classification函数的主要参数如下所示。

·n_samples：样本数量，默认值100。

·n_features：特征个数=n_informative+n_redundant+n_repeated，默认值20。

·n_informative：信息特征的个数，默认值2。

·n_redundant：冗余信息，informative特征的随机线性组合，默认值2。

·n_repeated：重复信息，随机提取n_informative和n_redundant特征，默认值0。

·n_classes：分类类别，默认值2。

·n_clusters_per_class：某一个类别是由几个cluster构成的，默认值2。

·weights：分类类别的样本比例，默认值是None，代表均衡比例。

·shuffle：随机打乱样本，默认值True。

·random_state：随机数种子，默认值是None，不配置该参数每次生成的数据都是随机的。

使用make_classification在Jupyter Notebook环境生成分类模型随机数据代码案例如下：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification

# X为样本特征，Y为样本类别输出，共1000个样本，每个样本5个特征，输出有2个类别，没有冗余特征，每个类别一个簇
X, Y = make_classification(n_samples=1000, n_features=5, n_redundant=0,n_informative =1,n_clusters_per_class=1, n_classes=2,random_state =20)
plt.scatter(X[:, 0],X[:, 1], marker='o' ,c=Y)
plt.show()

输出如图3-3所示。

图3-3　分类模型随机数据散点图