Class Clustering

  • All Implemented Interfaces:
    Discretize

    public final class Clustering
    extends Object
    implements Discretize
    Discretizes continuous data in bins, using a probabilistic clustering algorithm.
    • Constructor Detail

      • Clustering

        public Clustering()
    • Method Detail

      • discretize

        public List<Interval<Double>> discretize​(Iterable<Double> unsortedData,
                                                 DiscretizationOptions options,
                                                 String dataColumn)
        Discretizes unsorted continuous data that may contain missing (null) values.
        Specified by:
        discretize in interface Discretize
        Parameters:
        unsortedData - The data to discretize.
        options - Options that affect how discretization is performed, such as the algorithm to use.
        dataColumn - The name of the source column. This is only used for error reporting.
        Returns:
        A number of bins each identified by an interval.
      • discretizeWeighted

        public List<Interval<Double>> discretizeWeighted​(Iterable<WeightedValue> unsortedData,
                                                         DiscretizationOptions options,
                                                         String dataColumn)
        Discretizes unsorted weighted continuous data that may contain missing (null) values.
        Specified by:
        discretizeWeighted in interface Discretize
        Parameters:
        unsortedData - The weighted data to discretize.
        options - Options that affect how discretization is performed, such as the algorithm to use.
        dataColumn - The name of the source column. This is only used for error reporting.
        Returns:
        A number of bins each identified by an interval.
      • discretize

        public List<DiscretizationInfo> discretize​(DataReaderCommand dataReaderCommand,
                                                   List<DiscretizationColumn> dataColumns,
                                                   DiscretizationAlgoOptions options)
        Discretizes one or more data columns, that may contain missing (null) values.
        Specified by:
        discretize in interface Discretize
        Parameters:
        dataReaderCommand - The data reader command to allow iteration of data.
        dataColumns - The data columns that should be discretized and options per column.
        options - Options governing the overall discretization algorithm. Each data column also has options.
        Returns:
        A number of bins each identified by an interval for each data column.