Machine Learning for Mobile
上QQ阅读APP看书,第一时间看更新

Naming the dataset

We will be using the Breast Cancer dataset. The following list contains the various conventions used in the dataset:

  • ID number 
  • Diagnosis (M = malignant, and B = benign) 
  • 10 real-valued features are computed for each cell nucleus:
    • Radius (mean of the distances from the center to points on the perimeter) 
    • Texture (standard deviation of gray scale values) 
    • Perimeter 
    • Area 
    • Smoothness (local variation in radius lengths) 
    • Compactness (perimeter^2/area - 1.0)
    • Concavity (severity of concave portions of the contour) 
    • Concave points (number of concave portions of the contour) 
    • Symmetry 
    • Fractal dimension (coastline approximation-1)

We will use random forest through Excel, applying the breast cancer dataset, to understand random forest in detail.  We will consider only data elements from 569 sample pieces of data from the breast cancer dataset for the purposes of analysis.