Data Sampling Tutorial

In this tutorial we will use the Waste sample network, which is included with Bayes Server, to demonstrate how to use Data Sampling to generate sample data from a network.

Open the Waste sample network included with Bayes Server, either from the Start Page or from the File menu, click Open.

The Waste network with no evidence set should look like this...

Waste network | No evidence set

info

When we sample data from a network, we can either have no evidence set, or we can consider existing evidence, such that each sample will have exactly that evidence subset set.

From the Evidence menu, click Sampling.

We will initially leave the options in the default states, which should look as follows:

Data Sampling | Waste | Default Options

Click Run.

Since we haven't set a Seed (one of the Data Sampling options), the results will look different each time, however will look something like the following:

Data Sampling | Waste | Default results

info

At this point we have the option to export the samples to a file, or add them as a temporary data source to use in subsequent analyses. The latter is extremely useful for ad-hoc experimentation, however for reproducibility you may prefer to save to a file and load that as a data source which will persist between sessions.

Existing evidence

We will now see how sampling works with existing evidence.

Close the Data Sampling dialog.
Set the following evidence by clicking on the node chart:
- Filter state = Defect
- Dust emission = 4.4

The network should look as follows:

Data Sampling | Waste | Evidence

From the Evidence menu, click Sampling.
Turn Use Network Evidence to True
Turn Log-Likelihood to True
Turn Log-Weight to True

Data Sampling | Waste | Options with evidence

Click Run

Since we haven't set a Seed (one of the Data Sampling options), the results will look different each time, however will look something like the following:

Data Sampling | Waste | Results with evidence

Verify that each sample generated has the following evidence set:
- Filter state = Defect
- Dust emission = 4.4

info

When we have existing evidence set, it is often useful to output the Log-Likelihood of the row, as that tells us how likely (anomalous) that case is.

info

When evidence is fixed, each sample has an associated weight, which is the likelihood of the fixed evidence for the current sample. This indicates how likely it is that the fixed evidence could have occurred with this sample. We decided to output the log-weight as this is more robust to underflow.

Missing data

We will now see how sampling works when we want to generate samples with missing data.

Note that the missing data mechanism used is Missing Completely At Random (MCAR).

info

Bayes Server has extensive support for inference, parameter learning and structural learning with missing data.

Close the Data Sampling dialog.
From the Evidence menu, click Clear
From the Evidence menu, click Sampling.
Click the Show more (x) items button
Set the Missing Data Probability to 0.1 (which is 5%)

The options page should look as follows:

Data Sampling | Missing Data options

info

You can use the Missing Data Exclusions options to limit missing data to certain columns.

Click Run

Data Sampling | Missing Data results

Verify that some data is missing/null.

info

Data with missing values is important to study how different algorithms handle missing data robustly, which Bayes Server does.

info

At the bottom of the dialog you can see what seed was used for generation (if you specified a seed, this will be the same). you can also see how many samples were generated. This is typically the number of samples you requested, but may be fewer if there is missing data and inconsistent evidence occurs for some samples.