Tutorial 5 - Classification

In this tutorial we will perform classification, which is the prediction of one or more discrete variables given what we know about other variables.

The following concepts will be covered:

NOTE

Bayes Server must be installed, before starting this tutorial. An evaluation version can be downloaded from the Downloads page

Companion video (No Audio)

Open the model

We will use the Bayesian network built in Tutorial 1 - A simple network shown below.

Identification network

Open the model

NOTE

If the Start page is not set to display on start up, or has been closed, click the Start page button, on the View tab, General group.

Batch queries

We have 100 test cases defined in the data section. We are going to use the columns Hair Length and Height in order to predict Gender. Since we are predicting a discrete variable, the task is known as classification. The data set includes the actual Gender, so that we can determine how well our model performs, however this column is not used to perform the predictions.

NOTE

Note that the data has some missing values. This is not a problem for a Bayesian network. It can still perform the prediction, using whatever information is available.

For convenience, we will use Microsoft Excel as the data source, however another database can be substituted.

Adding a data connection

NOTE

Note: You can skip this step, and instead use the pre-installed Tutorial data connection (Walkthrough Data in earlier versions).

Batch query

NOTE

Because we are predicting Gender, we do not want the Gender variable to be mapped.

NOTE

In order to test how well our model can predict Gender, we want to have access to the Gender data column, but we do not want to map it to the variable we are predicting.

NOTE

Instead of outputting to the window, you can also output the predictions to a database. This is useful if you are working with large datasets.

The window should look like this:

Classification batch query window

Confusion matrix

In order to determine how well our model performed, we can use a confusion matrix.

Confusion matrix

Data

Gender Hair Length Height
Female Medium 159.64532
Male Short 178.50209
Female Short 170.2725
Female Medium 160.31395
Female Long 156.32858
Female Long 165.43799
Male Short 177.59889
Female Medium 161.11003
Male Short 166.09811
Female Long 173.34889
Male Short 169.16522
Male Medium 179.45741
Female Long
Female Medium 158.67832
Female Long 171.75507
Female Short 165.4013
Male Short 188.6639
Male Short
Female Long 165.88785
Female Medium 168.43815
Male Short 178.84286
Female Short 164.10128
Female Medium 173.39975
Female Medium 160.2925
Female Medium 166.0434
Female Long 159.51891
Female Medium 167.27399
Female Medium 162.01801
Male Short 159.67172
Female Long 149.85316
Male Short 178.85521
Female Medium 159.10519
Male Short 176.89731
Male Medium 160.80553
Male Short 176.67044
Female Medium 151.4692
Female Medium 159.47791
Medium 178.30403
Male Long 177.37518
Male Short 175.68627
Male Medium 182.13118
Female Long 168.80542
Male Short 173.47985
Male 174.67784
Female Long 167.92433
Female Long 170.78801
Short 173.21558
Male Short 185.71675
Male Medium 192.61151
Female Long 165.47273
Male Short 179.94032
Male 185.23601
Male Short 180.676
Female Long 167.14232
Male Short 166.71996
Female Long 147.9807
Female Long
Male Short 178.66922
Male Short 179.55905
Male Short 189.99837
Male Short 172.49842
Male Short 186.58113
Female Short 169.12165
Long 165.95135
Female Long 168.34383
Long 174.84138
Male Short 173.94395
Female Short 155.70222
Female Long 177.06825
Male Short 173.52714
Female Short 170.73774
Female Medium 158.87229
Female Long 147.5172
Male Medium 170.96061
Short 191.28145
Male Medium 170.87405
Male Short 179.53121
Long 160.09839
Female Long 153.82008
Female Long 167.66346
Male Medium
Male Short 176.23203
Female Medium 160.16516
Female Medium 153.82284
Male Medium 169.74507
Male Short 179.47557
Female Long 162.2582
Female Long 154.11746
Male Short 168.06671
Male Short 191.50926
Male Medium 185.57492
Female Long 161.82199
Female Medium 158.64344
Female Short 175.84038
Female Medium 162.36804
Male Short 169.27324
Female Medium 169.56408
Male Short 174.71516
Male Short 181.95237
Male Short 187.56014