In supervised machine learning, computers put data through a model to make predictions. Today, you'll see how decision trees can identify cancer cells.

Understanding the data

Physicians diagnose cancer by analyzing suspect cells after a biopsy. Researchers at the University of Wisconson quantified biopsy images, so computers could too.

What humans see

Under a microscope, cancer looks "primitive and aggressive," a chaotic agglomeration of cells with irregularly shaped, sized, and patterned nuclei.

What computers "see"

Researchers quantified ten characteristics of cell nuclei in breast-cancer biopsy images.








Concave Points


Fractal Dimensions

Identifying Features

For each biopsy, the researchers calculated every attribute's average, standard error, and highest values. So, the data has 30 features (a.k.a. predictors, variables).

Assessing their value add

Computers prioritize features that contribute the most information to the model. Decision trees do this by analyzing the distribution of classes of observations.

An example

Note for later: Make the line showing up more obvious, e.g. the radius or something.

Of the 569 biopsies in the data set, the largest radius ranges from approximately 8 to 32 μm.

The cells in one class of data, benign, to be smaller...

…whereas those in the other class, malignant, tend to be larger.

Note: Show where our malignant sample falls in the histogram.

These two classes of data have different distributions, which means the largest radius could be useful for cancer diagnosis.

Make this transition smoother. show where our sample shows up in the histograms. Or big histogram becomes little histogram.

Seeing all predictors

A computer will analyze all 30 features in this way.

Features with less overlap provide the model with more information.

Finding forks

While building a decision tree, computers divide the data points into homogenous groups.

Picking the splits

The computer must find forks ("if-then" statements) that split the data into branches.

A majority vote in each branch determines a biopsy's classification.

Finding the best split point requires making trade offs.

A split point that captures every malignant sample has many false positives.

Total Error

However, a split point that avoids all false positives has many false negatives.

Total Error

At the best split, both branches are as homogeneous as possible. Computers find this using math (like the Gini Ratio).

Total Error

Combining forks

Adding additional forks can improve a tree's prediction accuracy. A tree with one fork is called a stump. One with many is called bushy tree.

Tree Depth Total Error
1 7.7%
Tree Depth Total Error
2 5.2%
Tree Depth Total Error
3 2.4%
Tree Depth Total Error
4 1.1%
Tree Depth Total Error
5 0.6%
Tree Depth Total Error
6 0%

Too perfect?

A 0% error rate is indeed too good to be true. In our next installment, you'll learn about training & test errors, the trouble with trees, and great alternatives.

Want to get updates?

Check out the data here