Classification is the process of using a model such as a Bayesian network to predict unknown values (output variables), using a number of known values (input variables).

For example we could use a classification model to diagnose a disease given a number of symptoms, or we could detect a fault in a mechanical system, based on readings from sensors.

Tip

Typically, the term classification refers to models that predict discrete variables.

Regression is the term used for models predicting continuous variables.

Classification using Bayesian networks requires that we model the relationship between the input variables and the output variables we are predicting.

If we have a dataset containing both input and outputs, of sufficient size, we can train a Bayesian network which can then be used on data containing only inputs, to predict outputs.

Tip

Expert opinion (i.e. manually specifying the model) could also be used to build or enhance a model

Tip

Classification is a type of supervised learning, because a model is trained specifically for the purpose of predicting the output variable.

Key features supported by Bayesian network classifiers

  • Multiple inputs (multiple factors)

  • Multiple outputs (e.g. different faults)

  • Links can be from input->output or output->input

  • Intermediate nodes (e.g. logical nodes, divorcing)

  • Links between inputs

  • Links between output

  • Pre-input nodes (e.g. mixtures)

  • Post-output nodes (e.g. aggregate information)

The Bayesian network classifier shown below, demonstrates different design methods.

Classification Example