Project Group: nz.ac.waikato.cms.weka

tiny-weka

nz.ac.waikato.cms.weka : tiny-weka

The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This artifact represents the bare API of the developer version, with no package manager, PMML, XML or user interface. It is aimed at commercial applications that license some of WEKA's algorithms.

Last Version: 3.9.15955

Release Date:

weka-dev

nz.ac.waikato.cms.weka : weka-dev

The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This version represents the developer version, the "bleeding edge" of development, you could say. New functionality gets added to this version.

Last Version: 3.9.6

Release Date:

weka-stable

nz.ac.waikato.cms.weka : weka-stable

The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This is the stable version. Apart from bugfixes, this version does not receive any other breaking updates.

Last Version: 3.8.6

Release Date:

hotSpot

nz.ac.waikato.cms.weka : hotSpot

HotSpot learns a set of rules (displayed in a tree-like structure) that maximize/minimize a target variable/value of interest. With a nominal target, one might want to look for segments of the data where there is a high probability of a minority value occuring (given the constraint of a minimum support). For a numeric target, one might be interested in finding segments where this is higher on average than in the whole data set. For example, in a health insurance scenario, find which health insurance groups are at the highest risk (have the highest claim ratio), or, which groups have the highest average insurance payout.

Last Version: 1.0.14

Release Date:

WekaExcel

nz.ac.waikato.cms.weka : WekaExcel

WekaExcel adds support to directory read from and write to spreadsheets in Microsoft Excel 97-2007 format. It uses Apache POI (http://poi.apache.org/), specifically POI-HSSF and POI-XSSF (http://poi.apache.org/spreadsheet/), in order to read/write Excel spreadsheets.

Last Version: 1.0.8

Release Date:

wekaPython

nz.ac.waikato.cms.weka : wekaPython

Integration with CPython for Weka. Python version 2.7.x or higher is required. Also requires the following packages to be installed in python: numpy, pandas, matplotlib and scikit-learn. This package provides a wrapper classifier and clusterer that, between them, cover 60+ scikit-learn algorithms. It also provides a general scripting step for the Knowlege Flow along with scripting plugin environments for the Explorer and Knowledge Flow.

Last Version: 1.0.13

Release Date:

optics_dbScan

nz.ac.waikato.cms.weka : optics_dbScan

The OPTICS and DBScan clustering algorithms. Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Second International Conference on Knowledge Discovery and Data Mining, 226-231, 1996; Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg Sander: OPTICS: Ordering Points To Identify the Clustering Structure. In: ACM SIGMOD International Conference on Management of Data, 49-60, 1999.

Last Version: 1.0.6

Release Date:

elasticNet

nz.ac.waikato.cms.weka : elasticNet

An implementation of the elastic net method for linear regression

Last Version: 1.0.1

Release Date:

javaFXScatter3D

nz.ac.waikato.cms.weka : javaFXScatter3D

A visualization component for displaying a 3D scatter plot of the data using Java 3D. Requires Java 3D to be installed. This version adds built-in sampling controls to the GUI. The default sampling percentage is set so that a maximum of 5000 instances are plotted. The user can adjust this higher or lower to suit their available processing speed and memory.

Last Version: 1.0.0

Release Date:

timeseriesForecasting

nz.ac.waikato.cms.weka : timeseriesForecasting

Provides a time series forecasting environment for Weka. Includes a wrapper for Weka regression schemes that automates the process of creating lagged variables and date-derived periodic variables and provides the ability to do closed-loop forecasting. New evaluation routines are provided by a special evaluation module and graphing of predictions/forecasts are provided via the JFreeChart library. Includes both command-line and GUI user interfaces. Sample time series data can be found in ${WEKA_HOME}/packages/timeseriesForecasting/sample-data.

Last Version: 1.1.27

Release Date:

scatterPlot3D

nz.ac.waikato.cms.weka : scatterPlot3D

A visualization component for displaying a 3D scatter plot of the data using Java 3D. Requires Java 3D to be installed. This version adds built-in sampling controls to the GUI. The default sampling percentage is set so that a maximum of 5000 instances are plotted. The user can adjust this higher or lower to suit their available processing speed and memory.

Last Version: 1.0.7

Release Date:

multiInstanceFilters

nz.ac.waikato.cms.weka : multiInstanceFilters

A collection of filters for manipulating multi-instance data. Includes PropositionalToMultiInstance, MultiInstanceToPropositional, MILESFilter and RELAGGS. For more information see: M.-A. Krogel, S. Wrobel: Facets of Aggregation Approaches to Propositionalization. In: Work-in-Progress Track at the Thirteenth International Conference on Inductive Logic Programming (ILP), 2003. Y. Chen, J. Bi, J.Z. Wang (2006). MILES: Multiple-instance learning via embedded instance selection. IEEE PAMI. 28(12):1931-1947. James Foulds, Eibe Frank: Revisiting multiple-instance learning via embedded instance selection. In: 21st Australasian Joint Conference on Artificial Intelligence, 300-310, 2008.

Last Version: 1.0.10

Release Date:

stackingC

nz.ac.waikato.cms.weka : stackingC

Implements StackingC (more efficient version of stacking). For more information, see A.K. Seewald: How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness. In: Nineteenth International Conference on Machine Learning, 554-561, 2002. Note: requires meta classifier to be a numeric prediction scheme

Last Version: 1.0.4

Release Date:

supervisedAttributeScaling

nz.ac.waikato.cms.weka : supervisedAttributeScaling

Package containing a class that rescales the attributes in a classification problem based on their discriminative power. This is useful as a pre-processing step for learning algorithms such as the k-nearest-neighbour method, to replace simple normalization. Each attribute is rescaled by multiplying it with a learned weight. All attributes excluding the class are assumed to be numeric and missing values are not permitted. To achieve the rescaling, this package also contains an implementation of non-negative logistic regression, which produces a logistic regression model with non-negative weights .

Last Version: 1.0.2

Release Date:

gridSearch

nz.ac.waikato.cms.weka : gridSearch

Performs a grid search of parameter pairs for the a classifier (Y-axis, default is LinearRegression with the "Ridge" parameter) and the PLSFilter (X-axis, "# of Components") and chooses the best pair found for the actual predicting. The initial grid is worked on with 2-fold CV to determine the values of the parameter pairs for the selected type of evaluation (e.g., accuracy). The best point in the grid is then taken and a 10-fold CV is performed with the adjacent parameter pairs. If a better pair is found, then this will act as new center and another 10-fold CV will be performed (kind of hill-climbing). This process is repeated until no better pair is found or the best pair is on the border of the grid. In case the best pair is on the border, one can let GridSearch automatically extend the grid and continue the search. Check out the properties 'gridIsExtendable' (option '-extend-grid') and 'maxGridExtensions' (option '-max-grid-extensions <num>'). GridSearch can handle doubles, integers (values are just cast to int) and booleans (0 is false, otherwise true). float, char and long are supported as well. The best filter/classifier setup can be accessed after the buildClassifier call via the getBestFilter/getBestClassifier methods. Note on the implementation: after the data has been passed through the filter, a default NumericCleaner filter is applied to the data in order to avoid numbers that are getting too small and might produce NaNs in other schemes.

Last Version: 1.0.12

Release Date:

discriminantAnalysis

nz.ac.waikato.cms.weka : discriminantAnalysis

Currently only contains Fisher's Linear Discriminant Analysis.

Last Version: 1.0.3

Release Date:

dualPerturbAndCombine

nz.ac.waikato.cms.weka : dualPerturbAndCombine

Class for building and using classification and regression trees based on the closed-form dual perturb and combine algorithm described in Pierre Geurts, Lous Wehenkel: Closed-form dual perturb and combine for tree-based models. In: Proceedings of the 22nd International Conference on Machine Learning, 233-240, 2005.

Last Version: 1.0.0

Release Date:

XMeans

nz.ac.waikato.cms.weka : XMeans

Cluster data using the X-means algorithm. X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures. For more information see: Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000.

Last Version: 1.0.6

Release Date:

distributedWekaBase

nz.ac.waikato.cms.weka : distributedWekaBase

This package provides generic configuration class and distributed map/reduce style tasks for Weka

Last Version: 1.0.17

Release Date:

partialLeastSquares

nz.ac.waikato.cms.weka : partialLeastSquares

This package contains a filter for computing partial least squares and transforming the input data into the PLS space. It also contains a classifier for performing PLS regression.

Last Version: 1.0.5

Release Date:

ordinalClassClassifier

nz.ac.waikato.cms.weka : ordinalClassClassifier

Meta classifier that allows standard classification algorithms to be applied to ordinal class problems. For more information see: Eibe Frank, Mark Hall: A Simple Approach to Ordinal Classification. In: 12th European Conference on Machine Learning, 145-156, 2001. Robert E. Schapire, Peter Stone, David A. McAllester, Michael L. Littman, Janos A. Csirik: Modeling Auction Price Uncertainty Using Boosting-based Conditional Density Estimation. In: Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), 546-553, 2002.

Last Version: 1.0.5

Release Date:

extraTrees

nz.ac.waikato.cms.weka : extraTrees

Package for generating a single Extra-Tree. Use with the RandomCommittee meta classifier to generate an Extra-Trees forest for classification or regression. This classifier requires all predictors to be numeric. Missing values are not allowed. Instance weights are taken into account. For more information, see Pierre Geurts, Damien Ernst, Louis Wehenkel (2006). Extremely randomized trees. Machine Learning. 63(1):3-42.

Last Version: 1.0.2

Release Date:

probabilityCalibrationTrees

nz.ac.waikato.cms.weka : probabilityCalibrationTrees

Provides probability calibration trees (PCTs) for local calibration of class probability estimates. To achieve calibration of a base learner, the PCT class must be used as the meta learner in the CascadeGeneralization class, which is also included in this package. The classifier to be calibrated must be used as the base learner in the CascadeGeneralization class. The CascadeGeneralization class can also be used independently to perform CascadeGeneralization for ensemble learning. The code for PCTs is largely the same as the LMT code for growing logistic model trees. For more details, see the ACML paper on probability calibration trees.

Last Version: 1.0.0

Release Date:

classificationViaClustering

nz.ac.waikato.cms.weka : classificationViaClustering

A simple meta-classifier that uses a clusterer for classification. For cluster algorithms that use a fixed number of clusterers, like SimpleKMeans, the user has to make sure that the number of clusters to generate are the same as the number of class labels in the dataset in order to obtain a useful model. Note: at prediction time, a missing value is returned if no cluster is found for the instance. The code is based on the 'clusters to classes' functionality of the weka.clusterers.ClusterEvaluation class by Mark Hall.

Last Version: 1.0.7

Release Date:

functionalTrees

nz.ac.waikato.cms.weka : functionalTrees

Functional trees (decision trees with oblique splits and functions at the leaves)

Last Version: 1.0.4

Release Date:

latentSemanticAnalysis

nz.ac.waikato.cms.weka : latentSemanticAnalysis

Performs latent semantic analysis and transformation of the data. Use in conjunction with a Ranker search. A low-rank approximation of the full data is found by specifying the number of singular values to use. The dataset may be transformed to give the relation of either the attributes or the instances (default) to the concept space created by the transformation.

Last Version: 1.0.5

Release Date:

multiInstanceLearning

nz.ac.waikato.cms.weka : multiInstanceLearning

A collection of multi-instance learning classifiers. Includes the Citation KNN method, several variants of the diverse density method, support vector machines for multi-instance learning, simple wrappers for applying standard propositional learners to multi-instance data, decision tree and rule learners, and some other methods.

Last Version: 1.0.9

Release Date:

ensemblesOfNestedDichotomies

nz.ac.waikato.cms.weka : ensemblesOfNestedDichotomies

A meta classifier for handling multi-class datasets with 2-class classifiers by building an ensemble of nested dichotomies. For more info, check Lin Dong, Eibe Frank, Stefan Kramer: Ensembles of Balanced Nested Dichotomies for Multi-class Problems. In: PKDD, 84-95, 2005. Eibe Frank, Stefan Kramer: Ensembles of nested dichotomies for multi-class problems. In: Twenty-first International Conference on Machine Learning, 2004.

Last Version: 1.0.6

Release Date:

logarithmicErrorMetrics

nz.ac.waikato.cms.weka : logarithmicErrorMetrics

Provides root mean square logarithmic error and mean absolute logarithmic error for evaluating regression schemes.

Last Version: 1.0.0

Release Date:

percentageErrorMetrics

nz.ac.waikato.cms.weka : percentageErrorMetrics

Provides root mean square percentage error and mean absolute percentage error for evaluating regression schemes.

Last Version: 1.0.1

Release Date:

LibSVM

nz.ac.waikato.cms.weka : LibSVM

A wrapper class for the libsvm tools (the libsvm classes, typically the jar file, need to be in the classpath to use this classifier). LibSVM runs faster than SMO since it uses LibSVM to build the SVM classifier. LibSVM allows users to experiment with One-class SVM, Regressing SVM, and nu-SVM supported by LibSVM tool. LibSVM reports many useful statistics about LibSVM classifier (e.g., confusion matrix,precision, recall, ROC score, etc.)

Last Version: 1.0.10

Release Date:

kfKettle

nz.ac.waikato.cms.weka : kfKettle

Knowledge Flow step that provides an entry point for data coming from the Kettle ETL tool.

Last Version: 1.0.5

Release Date:

multiLayerPerceptrons

nz.ac.waikato.cms.weka : multiLayerPerceptrons

This package currently contains classes for training multilayer perceptrons with one hidden layer, where the number of hidden units is user specified. MLPClassifier can be used for classification problems and MLPRegressor is the corresponding class for numeric prediction tasks. The former has as many output units as there are classes, the latter only one output unit. Both minimise a penalised squared error with a quadratic penalty on the (non-bias) weights, i.e., they implement "weight decay", where this penalised error is averaged over all training instances. The size of the penalty can be determined by the user by modifying the "ridge" parameter to control overfitting. The sum of squared weights is multiplied by this parameter before added to the squared error. Both classes use BFGS optimisation by default to find parameters that correspond to a local minimum of the error function. but optionally conjugated gradient descent is available, which can be faster for problems with many parameters. Logistic functions are used as the activation functions for all units apart from the output unit in MLPRegressor, which employs the identity function. Input attributes are standardised to zero mean and unit variance. MLPRegressor also rescales the target attribute (i.e., "class") using standardisation. All network parameters are initialised with small normally distributed random values.

Last Version: 1.0.10

Release Date:

streamingUnivariateStats

nz.ac.waikato.cms.weka : streamingUnivariateStats

This package provides A Knowledge Flow step to compute summary statistics incrementally

Last Version: 1.0.1

Release Date:

prefuseGraph

nz.ac.waikato.cms.weka : prefuseGraph

A visualization component for displaying tree structures from those schemes that can output graphs (e.g. bayes nets). This component is available from the popup menu in the Explorer's classify. The component uses the prefuse visualization library.

Last Version: 1.0.4

Release Date:

distributedWekaHadoopCore

nz.ac.waikato.cms.weka : distributedWekaHadoopCore

This package provides loaders and savers for HDFS, plus Hadoop jobs and tasks that wrap the tasks provided in distributedWekaBase.

Last Version: 1.0.21

Release Date:

prefuseGraphViewer

nz.ac.waikato.cms.weka : prefuseGraphViewer

Knowledge Flow visualization component for displaying tree and graph structures from those schemes that can output them. This component is an alternative to the Knowledge Flow's built-in GraphViewer and uses the PrefuseTree and PrefuseGraph packages which, in turn, use the prefuse visualization library.

Last Version: 1.0.4

Release Date:

kfGroovy

nz.ac.waikato.cms.weka : kfGroovy

Knowledge Flow plugin that provides a Knowledge Flow step that wraps around a Groovy script. The plugin generates a fully compilable template Groovy script that implements various Knowledge Flow interfaces. The user can fill in the methods that are necessary to accomplish the desired logic. The script is compiled at runtime and the Groovy component passes incoming events to the script and collects and passes on generated events.

Last Version: 1.0.12

Release Date:

bayesianLogisticRegression

nz.ac.waikato.cms.weka : bayesianLogisticRegression

Implements Bayesian Logistic Regression for both Gaussian and Laplace Priors. For more information, see Alexander Genkin, David D. Lewis, David Madigan (2004). Large-scale bayesian logistic regression for text categorization.

Last Version: 1.0.5

Release Date:

largeScaleKernelLearning

nz.ac.waikato.cms.weka : largeScaleKernelLearning

This package provides filters to enable kernel-based learning from large datasets. It currently only contains the Nystroem method.

Last Version: 1.0.1

Release Date:

iterativeAbsoluteErrorRegression

nz.ac.waikato.cms.weka : iterativeAbsoluteErrorRegression

Provides a regression scheme that uses Schlossmacher's iteratively reweighted least squares method to fit a model that minimizes absolute error. The scheme can be used with any base learner in WEKA that performs least-squares regression

Last Version: 1.0.0

Release Date:

isolationForest

nz.ac.waikato.cms.weka : isolationForest

Class for building and using a classifier built on the Isolation Forest anomaly detection algorithm. For more information see Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou. 2008. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413-422.

Last Version: 1.0.2

Release Date:

cascadeKMeans

nz.ac.waikato.cms.weka : cascadeKMeans

k-means clustering with automatic selection of k. Restarts k-means and selects the best k using the Calinski and Harabasz criterion, without cross-validation.

Last Version: 1.0.4

Release Date:

niftiLoader

nz.ac.waikato.cms.weka : nifiLoader

Package for loading a directory containing MRI data in NIfTI format. The directory to be loaded must contain as many subdirectories as there are classes of MRI data. Each subdirectory name will be used as the class label for the corresponding .nii files in that subdirectory. (This is the same strategy as the one used by WEKA's TextDirectoryLoader.) Currently, the package only reads volume information for the first time slot from each .nii file. The readDoubleVol(short ttt) method from the Nifti1Dataset class (http://niftilib.sourceforge.net/java_api_html/Nifti1Dataset.html) is used to read the data for each volume into a sparse WEKA instance (with ttt=0). For an LxMxN volume (the dimensions must be the same for each .nii file in the directory!), the order of values in the generated instance is [(z_1, y_1, x_1), ..., (z_1, y_1, x_L), (z_1, y_2, x_1), ..., (z_1, y_M, x_L), (z_2, y_1, x_1), ..., (z_N, y_M, x_L)]. If the volume is an image, then only x and y coordinates are used.

Last Version: 1.0.1

Release Date:

LibLINEAR

nz.ac.waikato.cms.weka : LibLINEAR

A wrapper class for the liblinear tools (the liblinear classes, typically the jar file, need to be in the classpath to use this classifier). Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin (2008). LIBLINEAR - A Library for Large Linear Classification.

Last Version: 1.9.7

Release Date:

distributedWekaHadoop

nz.ac.waikato.cms.weka : distributedWekaHadoop

This package provides loaders and savers for HDFS, plus Hadoop jobs and tasks that wrap the tasks provided in distributedWekaBase. Includes libraries for Hadoop 1.1.2.

Last Version: 1.0.15

Release Date:

timeSeriesFilters

nz.ac.waikato.cms.weka : timeSeriesFilters

Description=Provides a set of filters for time series. Currently contains PAA and SAX transformation filters and a filter that converts symbolic time series to string attribute values. The time series need to be given as values of a relation-valued attribute in the ARFF file. For example data in ARFF format, check the data directory of this package.

Last Version: 1.0.0

Release Date:

alternatingModelTrees

nz.ac.waikato.cms.weka : alternatingModelTrees

Grows an alternating model tree by minimising squared error. For more information see "Eibe Frank, Michael Mayo, Stefan Kramer: Alternating Model Trees. In: Proceedings of the ACM Symposium on Applied Computing, Data Mining Track, 2015".

Last Version: 1.0.0

Release Date:

RBFNetwork

nz.ac.waikato.cms.weka : RBFNetwork

RBFNetwork implements a normalized Gaussian radial basisbasis function network. It uses the k-means clustering algorithm to provide the basis functions and learns either a logistic regression (discrete class problems) or linear regression (numeric class problems) on top of that. Symmetric multivariate Gaussians are fit to the data from each cluster. If the class is nominal it uses the given number of clusters per class. RBFRegressor implements radial basis function networks for regression, trained in a fully supervised manner using WEKA's Optimization class by minimizing squared error with the BFGS method. It is possible to use conjugate gradient descent rather than BFGS updates, which is faster for cases with many parameters, and to use normalized basis functions instead of unnormalized ones.

Last Version: 1.0.8

Release Date:

classifierBasedAttributeSelection

nz.ac.waikato.cms.weka : classifierBasedAttributeSelection

This package provides two classes - one for evaluating the merit of individual attributes using a classifier (ClassifierAttributeEval), and second for evaluating the merit of subsets of attributes using a classifier (ClassifierSubsetEval). Both invoke a user-specified classifier to perform the evaluation, either under cross-validation or on the training data.

Last Version: 1.0.5

Release Date: