Tutorial 5 - Classification
In this tutorial we will perform classification, which is the prediction of one or more discrete variables given what we know about other variables.
The following concepts will be covered:
- Creating a Data Connection
- Performing classification using batch queries
- Queries with missing data
- Confusion matrix
- Charting predictions
- Predicting the log likelihood
NOTE
Bayes Server must be installed, before starting this tutorial. An evaluation version can be downloaded from the Downloads page
Companion video (No Audio)
Open the model
We will use the Bayesian network built in Tutorial 1 - A simple network shown below.
Open the model
- Launch Bayes Server, and on the Start page click the network entitled 'Tutorial 1 - A simple network' in the Sample networks pane.
NOTE
If the Start page is not set to display on start up, or has been closed, click the Start page button, on the View tab, General group.
Batch queries
We have 100 test cases defined in the data section. We are going to use the columns Hair Length and Height in order to predict Gender. Since we are predicting a discrete variable, the task is known as classification. The data set includes the actual Gender, so that we can determine how well our model performs, however this column is not used to perform the predictions.
NOTE
Note that the data has some missing values. This is not a problem for a Bayesian network. It can still perform the prediction, using whatever information is available.
For convenience, we will use Microsoft Excel as the data source, however another database can be substituted.
Adding a data connection
NOTE
Note: You can skip this step, and instead use the pre-installed Tutorial data connection (Walkthrough Data in earlier versions).
- Select the data (including the header) in the data section and copy it to the clipboard (Ctrl+C).
- Open Microsoft Excel and paste the data into a new Microsoft Excel spreadsheet (Ctrl+V).
- Save the new spreadsheet.
- In Bayes Server, click the Data Connections button on the Data tab, Data Sources group. This will launch the Data connection manager.
- Click the New button on the toolbar. This will launch the Data connection editor.
- In the list of data providers, select the appropriate Excel Driver for the version of Microsoft Excel you are using.
- Next to the File Name text box, click the Ellipsis (...) button, and select the Microsoft Excel spreadsheet created in an earlier step.
- Click the Test Connection button, to ensure the new data connection is working.
- Click OK to add the new Data Connection.
Batch query
- Click the Batch query button, on the Data tab. This will launch the Data tables window.
- In the Data Connection drop down, select the new Data Connection created in an earlier step, or the Tutorial data connection if you skipped that step. This should enable the Data drop down.
- In the Data drop down, select the worksheet that contains the data. (If the data is on the first worksheet, select Sheet1$). If you are using the pre-installed Tutorial data connection, select Tutorial 5 - Classification.
- Click the OK button. This will launch the Data map window.
- In the Data map window, ensure that variable Hair length has automatically been mapped to column Hair length, and variable Height has automatically been mapped to column Height.
NOTE
Because we are predicting Gender, we do not want the Gender variable to be mapped.
- Click the Un-map column button at the end of Gender row.
NOTE
In order to test how well our model can predict Gender, we want to have access to the Gender data column, but we do not want to map it to the variable we are predicting.
Click on the Information tab, and click the check box next to Gender.
The window tabs should look like this:
NOTE
Another way of performing the same prediction, would be to leave the default mappings (including Gender) and use the Retract evidence feature which assumes the variable you are predicting is missing, even if it mapped to non missing data.
Click the OK button. This will launch the Batch query window.
In the query pane on the left hand side, ensure the following queries/information columns are checked.
- LogLikelihood
- Predict(Gender)
- PredictProbability(Gender)
- Gender
Click the Start button on the Batch Query tab, Batch Query group. This outputs the predictions to the window.
NOTE
Instead of outputting to the window, you can also output the predictions to a database. This is useful if you are working with large datasets.
The window should look like this:
Confusion matrix
In order to determine how well our model performed, we can use a confusion matrix.
Confusion matrix
Change to the statistics tab on the Batch query window, and click the Confusion Matrix button in the Classification group. This will launch the Confusion matrix options window.
Ensure that Gender is selected in the Actual drop down, and Predict(Gender) is selected in the Predicted drop down.
The window should look like this.
Click the Ok button, which will calculate and display the confusion matrix. The Confusion matrix window should look like this.
Diagonal elements in the confusion matrix relate to predictions which correctly classify Gender. Off diagonal elements in the confusion matrix are incorrect classifications.
Data
Gender | Hair Length | Height |
---|---|---|
Female | Medium | 159.64532 |
Male | Short | 178.50209 |
Female | Short | 170.2725 |
Female | Medium | 160.31395 |
Female | Long | 156.32858 |
Female | Long | 165.43799 |
Male | Short | 177.59889 |
Female | Medium | 161.11003 |
Male | Short | 166.09811 |
Female | Long | 173.34889 |
Male | Short | 169.16522 |
Male | Medium | 179.45741 |
Female | Long | |
Female | Medium | 158.67832 |
Female | Long | 171.75507 |
Female | Short | 165.4013 |
Male | Short | 188.6639 |
Male | Short | |
Female | Long | 165.88785 |
Female | Medium | 168.43815 |
Male | Short | 178.84286 |
Female | Short | 164.10128 |
Female | Medium | 173.39975 |
Female | Medium | 160.2925 |
Female | Medium | 166.0434 |
Female | Long | 159.51891 |
Female | Medium | 167.27399 |
Female | Medium | 162.01801 |
Male | Short | 159.67172 |
Female | Long | 149.85316 |
Male | Short | 178.85521 |
Female | Medium | 159.10519 |
Male | Short | 176.89731 |
Male | Medium | 160.80553 |
Male | Short | 176.67044 |
Female | Medium | 151.4692 |
Female | Medium | 159.47791 |
Medium | 178.30403 | |
Male | Long | 177.37518 |
Male | Short | 175.68627 |
Male | Medium | 182.13118 |
Female | Long | 168.80542 |
Male | Short | 173.47985 |
Male | 174.67784 | |
Female | Long | 167.92433 |
Female | Long | 170.78801 |
Short | 173.21558 | |
Male | Short | 185.71675 |
Male | Medium | 192.61151 |
Female | Long | 165.47273 |
Male | Short | 179.94032 |
Male | 185.23601 | |
Male | Short | 180.676 |
Female | Long | 167.14232 |
Male | Short | 166.71996 |
Female | Long | 147.9807 |
Female | Long | |
Male | Short | 178.66922 |
Male | Short | 179.55905 |
Male | Short | 189.99837 |
Male | Short | 172.49842 |
Male | Short | 186.58113 |
Female | Short | 169.12165 |
Long | 165.95135 | |
Female | Long | 168.34383 |
Long | 174.84138 | |
Male | Short | 173.94395 |
Female | Short | 155.70222 |
Female | Long | 177.06825 |
Male | Short | 173.52714 |
Female | Short | 170.73774 |
Female | Medium | 158.87229 |
Female | Long | 147.5172 |
Male | Medium | 170.96061 |
Short | 191.28145 | |
Male | Medium | 170.87405 |
Male | Short | 179.53121 |
Long | 160.09839 | |
Female | Long | 153.82008 |
Female | Long | 167.66346 |
Male | Medium | |
Male | Short | 176.23203 |
Female | Medium | 160.16516 |
Female | Medium | 153.82284 |
Male | Medium | 169.74507 |
Male | Short | 179.47557 |
Female | Long | 162.2582 |
Female | Long | 154.11746 |
Male | Short | 168.06671 |
Male | Short | 191.50926 |
Male | Medium | 185.57492 |
Female | Long | 161.82199 |
Female | Medium | 158.64344 |
Female | Short | 175.84038 |
Female | Medium | 162.36804 |
Male | Short | 169.27324 |
Female | Medium | 169.56408 |
Male | Short | 174.71516 |
Male | Short | 181.95237 |
Male | Short | 187.56014 |