Project Group: nz.ac.waikato.cms.weka

consistencySubsetEval

nz.ac.waikato.cms.weka : consistencySubsetEval

Evaluates the worth of a subset of attributes by the level of consistency in the class values when the training instances are projected onto the subset of attributes. The consistency of any subset can never be lower than that of the full set of attributes, hence the usual practice is to use this subset evaluator in conjunction with a Random or Exhaustive search which looks for the smallest subset with consistency equal to that of the full set of attributes. See: H. Liu, R. Setiono: A probabilistic approach to feature selection - A filter solution. In: 13th International Conference on Machine Learning, 319-327, 1996.

Last Version: 1.0.4

Release Date:

predictiveApriori

nz.ac.waikato.cms.weka : predictiveApriori

Class implementing the predictive apriori algorithm for mining association rules. It searches with an increasing support threshold for the best 'n' rules concerning a support-based corrected confidence value. For more information see: Tobias Scheffer: Finding Association Rules That Trade Support Optimally against Confidence. In: 5th European Conference on Principles of Data Mining and Knowledge Discovery, 424-435, 2001.

Last Version: 1.0.4

Release Date:

fuzzyUnorderedRuleInduction

nz.ac.waikato.cms.weka : fuzzyUnorderedRuleInduction

FURIA: Fuzzy Unordered Rule Induction Algorithm. For details please see: Jens Christian Huehn, Eyke Huellermeier (2009). FURIA: An Algorithm for Unordered Fuzzy Rule Induction. Data Mining and Knowledge Discovery.

Last Version: 1.0.2

Release Date:

classAssociationRules

nz.ac.waikato.cms.weka : classAssociationRules

Class association rules algorithms (including an implementation of the CBA algorithm). For more information see: W. Li, J. Han, J.Pei: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In ICDM'01:369-376,2001. B. Liu, W. Hsu, Y. Ma: Integrating Classification and Association Rule Mining. In KDD'98:80-86,1998.

Last Version: 1.0.3

Release Date:

DTNB

nz.ac.waikato.cms.weka : DTNB

Class for building and using a decision table/naive bayes hybrid classifier. At each point in the search, the algorithm evaluates the merit of dividing the attributes into two disjoint subsets: one for the decision table, the other for naive Bayes. A forward selection search is used, where at each step, selected attributes are modeled by naive Bayes and the remainder by the decision table, and all attributes are modelled by the decision table initially. At each step, the algorithm also considers dropping an attribute entirely from the model. For more information, see: Mark Hall, Eibe Frank: Combining Naive Bayes and Decision Tables. In: Proceedings of the 21st Florida Artificial Intelligence Society Conference (FLAIRS), 318-319, 2008.

Last Version: 1.0.3

Release Date:

complementNaiveBayes

nz.ac.waikato.cms.weka : complementNaiveBayes

Class for building and using a Complement class Naive Bayes classifier. For more information see: Jason D. Rennie, Lawrence Shih, Jaime Teevan, David R. Karger: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: ICML, 616-623, 2003. P.S.: TF, IDF and length normalization transforms, as described in the paper, can be performed through weka.filters.unsupervised.StringToWordVector.

Last Version: 1.0.3

Release Date:

conjunctiveRule

nz.ac.waikato.cms.weka : conjunctiveRule

This class implements a single conjunctive rule learner that can predict for numeric and nominal class labels. A rule consists of antecedents "AND"ed together and the consequent (class value) for the classification/regression. In this case, the consequent is the distribution of the available classes (or mean for a numeric value) in the dataset. If the test instance is not covered by this rule, then it's predicted using the default class distributions/value of the data not covered by the rule in the training data.This learner selects an antecedent by computing the Information Gain of each antecendent and prunes the generated rule using Reduced Error Prunning (REP) or simple pre-pruning based on the number of antecedents. For classification, the Information of one antecedent is the weighted average of the entropies of both the data covered and not covered by the rule. For regression, the Information is the weighted average of the mean-squared errors of both the data covered and not covered by the rule. In pruning, weighted average of the accuracy rates on the pruning data is used for classification while the weighted average of the mean-squared errors on the pruning data is used for regression.

Last Version: 1.0.4

Release Date:

costSensitiveAttributeSelection

nz.ac.waikato.cms.weka : costSensitiveAttributeSelection

This package provides two meta attribute selection evaluators - one for performing cost-sensitive attribute evaluation (CostSensitiveAttributeEval) and a second for performing cost-sensitive subset evaluation (CostSensitiveSubsetEval). Both methods take a cost matrix and a base evaluator. If the base evaluator can handle instance weights, then the training data is weighted according to the cost matrix, otherwise the training data is sampled according to the cost matrix.

Last Version: 1.0.3

Release Date:

dagging

nz.ac.waikato.cms.weka : dagging

This meta classifier creates a number of disjoint, stratified folds out of the data and feeds each chunk of data to a copy of the supplied base classifier. Predictions are made via majority vote, since all the generated base classifiers are put into the Vote meta classifier. Useful for base classifiers that are quadratic or worse in time behavior, regarding number of instances in the training data. For more information, see: Ting, K. M., Witten, I. H.: Stacking Bagged and Dagged Models. In: Fourteenth international Conference on Machine Learning, San Francisco, CA, 367-375, 1997.

Last Version: 1.0.3

Release Date:

denormalize

nz.ac.waikato.cms.weka : denormalize

An instance filter that collapses instances with a common grouping ID value into a single instance. Useful for converting transactional data into a format that Weka's association rule learners can handle. IMPORTANT: assumes that the incoming batch of instances has been sorted on the grouping attribute. The values of nominal attributes are converted to indicator attributes. These can be either binary (with f and t values) or unary with missing values used to indicate absence. The later is Weka's old market basket format, which is useful for Apriori. Numeric attributes can be aggregated within groups by computing the average, sum, minimum or maximum.

Last Version: 1.0.3

Release Date:

clojureClassifier

nz.ac.waikato.cms.weka : clojureClassifier

Wrapper classifier for classifiers written in the Clojure language.

Last Version: 1.0.1

Release Date:

alternatingDecisionTrees

nz.ac.waikato.cms.weka : alternatingDecisionTrees

Binary-class and multi-class alternating decision trees. For more information see: Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceeding of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 124-133, 1999. Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Eibe Frank, Mark Hall: Multiclass alternating decision trees. In: ECML, 161-172, 2001.

Last Version: 1.0.5

Release Date:

attributeSelectionSearchMethods

nz.ac.waikato.cms.weka : attributeSelectionSearchMethods

This package provides four search methods for attribute selection: ExhaustiveSearch, GeneticSearch, RandomSearch and RankSearch. See: David E. Goldberg (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley. Mark Hall, Geoffrey Holmes (2003). Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering. 15(6):1437-1447.

Last Version: 1.0.7

Release Date:

bestFirstTree

nz.ac.waikato.cms.weka : bestFirstTree

Class for building a best-first decision tree classifier. This class uses binary split for both nominal and numeric attributes. For missing values, the method of 'fractional' instances is used. For more information, see: Haijian Shi (2007). Best-first decision tree learning. Hamilton, NZ. Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000). Additive logistic regression : A statistical view of boosting. Annals of statistics. 28(2):337-407.

Last Version: 1.0.4

Release Date:

chiSquaredAttributeEval

nz.ac.waikato.cms.weka : chiSquaredAttributeEval

Evaluates the worth of an attribute by computing the value of the chi-squared statistic with respect to the class.

Last Version: 1.0.4

Release Date:

DilcaDistance

nz.ac.waikato.cms.weka : DilcaDistance

This package implements the parameter free version of the DILCA distance. This approach allows to learn value-to-value distances between each pair of values for each attribute of the dataset. The distance between two values is computed indirectly based on the their distribution w.r.t. a set of related attributes (the context) carefully chosen.

Last Version: 1.0.1

Release Date:

J48graft

nz.ac.waikato.cms.weka : J48graft

Class for generating a grafted (pruned or unpruned) C4.5 decision tree. For more information, see Geoff Webb: Decision Tree Grafting From the All-Tests-But-One Partition.

Last Version: 1.0.3

Release Date:

thresholdSelector

nz.ac.waikato.cms.weka : thresholdSelector

A metaclassifier that selecting a mid-point threshold on the probability output by a Classifier. The midpoint threshold is set so that a given performance measure is optimized. Currently this is the F-measure. Performance is measured either on the training data, a hold-out set or using cross-validation. In addition, the probabilities returned by the base learner can have their range expanded so that the output probabilities will reside between 0 and 1 (this is useful if the scheme normally produces probabilities in a very narrow range).

Last Version: 1.0.3

Release Date:

userClassifier

nz.ac.waikato.cms.weka : userClassifier

Interactively classify through visual means. You are Presented with a scatter graph of the data against two user selectable attributes, as well as a view of the decision tree. You can create binary splits by creating polygons around data plotted on the scatter graph, as well as by allowing another classifier to take over at points in the decision tree should you see fit. For more information see: Malcolm Ware, Eibe Frank, Geoffrey Holmes, Mark Hall, Ian H. Witten (2001). Interactive machine learning: letting users build classifiers. Int. J. Hum.-Comput. Stud. 55(3):281-292.

Last Version: 1.0.3

Release Date:

localOutlierFactor

nz.ac.waikato.cms.weka : localOutlierFactor

A filter that applies the LOF (Local Outlier Factor) algorithm to compute an outlier score for each instance in the data. Can use multiple cores/cpus to speed up the LOF computation for large datasets. Nearest neighbor search methods and distance functions are pluggable. For more information, see: Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jorg Sander (2000). LOF: Identifying Density-Based Local Outliers. ACM SIGMOD Record. 29(2):93-104.

Last Version: 1.0.4

Release Date:

kernelLogisticRegression

nz.ac.waikato.cms.weka : kernelLogisticRegression

This package contains a classifier that can be used to train a two-class kernel logistic regression model with the kernel functions that are available in WEKA. It optimises the negative log-likelihood with a quadratic penalty. Both, BFGS and conjugate gradient descent, are available as optimisation methods, but the former is normally faster. It is possible to use multiple threads, but the speed-up is generally very marginal when used with BFGS optimisation. With conjugate gradient descent optimisation, greater speed-ups can be achieved when using multiple threads. With the default kernel, the dot product kernel, this method produces results that are close to identical to those obtained using standard logistic regression in WEKA, provided a sufficiently large value for the parameter determining the size of the quadratic penalty is used in both cases.

Last Version: 1.0.0

Release Date:

oneClassClassifier

nz.ac.waikato.cms.weka : oneClassClassifier

Performs one-class classification on a dataset. Classifier reduces the class being classified to just a single class, and learns the datawithout using any information from other classes. The testing stage will classify as 'target'or 'outlier' - so in order to calculate the outlier pass rate the dataset must contain informationfrom more than one class. Also, the output varies depending on whether the label 'outlier' exists in the instances usedto build the classifier. If so, then 'outlier' will be predicted, if not, then the label willbe considered missing when the prediction does not favour the target class. The 'outlier' classwill not be used to build the model if there are instances of this class in the dataset. It cansimply be used as a flag, you do not need to relabel any classes. For more information, see: Kathryn Hempstalk, Eibe Frank, Ian H. Witten: One-Class Classification by Combining Density and Class Probability Estimation. In: Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, ECMLPKDD2008, Berlin, 505--519, 2008.

Last Version: 1.0.4

Release Date:

SMOTE

nz.ac.waikato.cms.weka : SMOTE

Resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The original dataset must fit entirely in memory. The amount of SMOTE and number of nearest neighbors may be specified. For more information, see Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16:321-357.

Last Version: 1.0.3

Release Date:

metaCost

nz.ac.waikato.cms.weka : metaCost

This metaclassifier makes its base classifier cost-sensitive using the method specified in Pedro Domingos: MetaCost: A general method for making classifiers cost-sensitive. In: Fifth International Conference on Knowledge Discovery and Data Mining, 155-164, 1999. This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

Last Version: 1.0.3

Release Date:

averagedOneDependenceEstimators

nz.ac.waikato.cms.weka : averagedOneDependenceEstimators

AODE achieves highly accurate classification by averaging over all of a small space of alternative naive-Bayes-like models that have weaker (and hence less detrimental) independence assumptions than naive Bayes. The resulting algorithm is computationally efficient while delivering highly accurate classification on many learning tasks. For more information, see G. Webb, J. Boughton, Z. Wang (2005). Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning. 58(1):5-24.

Last Version: 1.2.1

Release Date:

WekaODF

nz.ac.waikato.cms.weka : WekaODF

WekaODF adds support to directory read from and write to spreadsheets in ODF (Open Document Format for Office Applications, ISO/IEC 26300:2006) format. ODF is used by the OpenOffice.org suite, for instance. WekaODF uses jOpenDocument (http://www.jOpenDocument.org, GPL) in order to read/write ODF spreadsheets.

Last Version: 1.0.4

Release Date:

phmm4weka

nz.ac.waikato.cms.weka : phmm4weka

This Java software implements Profile Hidden Markov Models (PHMMs) for protein classification for the WEKA workbench. Standard PHMMs and newly introduced binary PHMMs are used. In addition the software allows propositionalisation of PHMMs.

Last Version: 1.1.3

Release Date:

rotationForest

nz.ac.waikato.cms.weka : rotationForest

An ensemble learning method inspired by bagging and random sub-spaces. Trains an ensemble of decision trees on random subspaces of the data, where each subspace has been transformed using principal components analysis.

Last Version: 1.0.3

Release Date:

decorate

nz.ac.waikato.cms.weka : decorate

DECORATE is a meta-learner for building diverse ensembles of classifiers by using specially constructed artificial training examples. Comprehensive experiments have demonstrated that this technique is consistently more accurate than the base classifier, Bagging and Random Forests. Decorate also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. For more details see: P. Melville, R. J. Mooney: Constructing Diverse Classifier Ensembles Using Artificial Training Examples. In: Eighteenth International Joint Conference on Artificial Intelligence, 505-510, 2003; P. Melville, R. J. Mooney (2004). Creating Diversity in Ensembles Using Artificial Data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems.

Last Version: 1.0.3

Release Date:

fastCorrBasedFS

nz.ac.waikato.cms.weka : fastCorrBasedFS

Feature selection method based on correlation measureand relevance and redundancy analysis. Use in conjunction with an attribute set evaluator (SymmetricalUncertAttributeEval). For more information see: Lei Yu, Huan Liu: Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Proceedings of the Twentieth International Conference on Machine Learning, 856-863, 2003.

Last Version: 1.0.2

Release Date:

normalize

nz.ac.waikato.cms.weka : normalize

An instance filter that normalize instances considering only numeric attributes and ignoring class index

Last Version: 1.0.2

Release Date:

prefuseTree

nz.ac.waikato.cms.weka : prefuseTree

A visualization component for displaying tree structures from those schemes that can output trees (e.g. decision tree learners, Cobweb clusterer etc.). This component is available from the popup menu in the Explorer's classify and cluster panels. The component uses the prefuse visualization library.

Last Version: 1.0.3

Release Date:

CLOPE

nz.ac.waikato.cms.weka : CLOPE

Yiling Yang, Xudong Guan, Jinyuan You: CLOPE: a fast and effective clustering algorithm for transactional data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 682-687, 2002.

Last Version: 1.0.2

Release Date:

DMNBtext

nz.ac.waikato.cms.weka : DMNBtext

Class for building and using a Discriminative Multinomial Naive Bayes classifier. For more information see: Jiang Su,Harry Zhang,Charles X. Ling,Stan Matwin: Discriminative Parameter Learning for Bayesian Networks. In: ICML 2008', 2008.

Last Version: 1.0.2

Release Date:

EMImputation

nz.ac.waikato.cms.weka : EMImputation

Replaces missing numeric values using Expectation Maximization with a multivariate normal model. Described in " Schafer, J.L. Analysis of Incomplete Multivariate Data, New York: Chapman and Hall, 1997."

Last Version: 1.0.2

Release Date:

NNge

nz.ac.waikato.cms.weka : NNge

Nearest-neighbor-like algorithm using non-nested generalized exemplars (which are hyperrectangles that can be viewed as if-then rules). For more information, see Brent Martin (1995). Instance-Based learning: Nearest Neighbor With Generalization. Hamilton, New Zealand. Sylvain Roy (2002). Nearest Neighbor With Generalization. Christchurch, New Zealand.

Last Version: 1.0.2

Release Date:

associationRulesVisualizer

nz.ac.waikato.cms.weka : associationRulesVisualizer

visualization component for displaying association rules that uses a modified version of the Association Rules Viewer from DESS IAGL of Lille. Requires Java 3D to be installed.

Last Version: 1.0.2

Release Date:

citationKNN

nz.ac.waikato.cms.weka : citationKNN

Modified version of the Citation kNN multi instance classifier. For more information see: Jun Wang, Zucker, Jean-Daniel: Solving Multiple-Instance Problem: A Lazy Learning Approach. In: 17th International Conference on Machine Learning, 1119-1125, 2000.

Last Version: 1.0.2

Release Date:

ensembleLibrary

nz.ac.waikato.cms.weka : ensembleLibrary

Manages a libary of ensemble classifiers

Last Version: 1.0.4

Release Date:

fuzzyLaticeReasoning

nz.ac.waikato.cms.weka : fuzzyLaticeReasoning

The Fuzzy Lattice Reasoning Classifier uses the notion of Fuzzy Lattices for creating a Reasoning Environment. The current version can be used for classification using numeric predictors. For more information see: I. N. Athanasiadis, V. G. Kaburlasos, P. A. Mitkas, V. Petridis: Applying Machine Learning Techniques on Air Quality Data for Real-Time Decision Support. In: 1st Intl. NAISO Symposium on Information Technologies in Environmental Engineering (ITEE-2003), Gdansk, Poland, 2003; V. G. Kaburlasos, I. N. Athanasiadis, P. A. Mitkas, V. Petridis (2003). Fuzzy Lattice Reasoning (FLR) Classifier and its Application on Improved Estimation of Ambient Ozone Concentration.

Last Version: 1.0.2

Release Date:

generalizedSequentialPatterns

nz.ac.waikato.cms.weka : generalizedSequentialPatterns

Class implementing a GSP algorithm for discovering sequential patterns in a sequential data set. The attribute identifying the distinct data sequences contained in the set can be determined by the respective option. Furthermore, the set of output results can be restricted by specifying one or more attributes that have to be contained in each element/itemset of a sequence. For further information see: Ramakrishnan Srikant, Rakesh Agrawal (1996). Mining Sequential Patterns: Generalizations and Performance Improvements.

Last Version: 1.0.2

Release Date:

hiddenNaiveBayes

nz.ac.waikato.cms.weka : hiddenNaiveBayes

Contructs Hidden Naive Bayes classification model with high classification accuracy and AUC. For more information refer to: H. Zhang, L. Jiang, J. Su: Hidden Naive Bayes. In: Twentieth National Conference on Artificial Intelligence, 919-924, 2005.

Last Version: 1.0.2

Release Date:

kfPMMLClassifierScoring

nz.ac.waikato.cms.weka : kfPMMLClassifierScoring

A Knowledge Flow plugin that provides a Knowledge Flow step for scoring test sets or instance streams using a PMML classifier.

Last Version: 1.0.3

Release Date:

lazyBayesianRules

nz.ac.waikato.cms.weka : lazyBayesianRules

Lazy Bayesian Rules Classifier. The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. Lazy Bayesian Rules selectively relaxes the independence assumption, achieving lower error rates over a range of learning tasks. LBR defers processing to classification time, making it a highly efficient and accurate classification algorithm when small numbers of objects are to be classified. For more information, see: Zijian Zheng, G. Webb (2000). Lazy Learning of Bayesian Rules. Machine Learning. 4(1):53-84.

Last Version: 1.0.2

Release Date:

levenshteinEditDistance

nz.ac.waikato.cms.weka : levenshteinEditDistance

Computes the Levenshtein edit distance between two strings.

Last Version: 1.0.2

Release Date:

linearForwardSelection

nz.ac.waikato.cms.weka : linearForwardSelection

Extension of BestFirst. Takes a restricted number of k attributes into account. Fixed-set selects a fixed number k of attributes, whereas k is increased in each step when fixed-width is selected. The search uses either the initial ordering to select the top k attributes, or performs a ranking (with the same evalutator the search uses later on). The search direction can be forward, or floating forward selection (with opitional backward search steps). For more information see: Martin Guetlein (2006). Large Scale Attribute Selection Using Wrappers. Freiburg, Germany.

Last Version: 1.0.2

Release Date:

multilayerPerceptronCS

nz.ac.waikato.cms.weka : multilayerPerceptronCS

An extension of the standard MultilayerPerceptron classifier in Weka that adds context-sensitive Multiple Task Learning (csMTL)

Last Version: 1.0.2

Release Date:

ordinalLearningMethod

nz.ac.waikato.cms.weka : ordinalLearningMethod

An implementation of the Ordinal Learning Method (OLM). Further information regarding the algorithm and variants can be found in: Arie Ben-David (1992). Automatic Generation of Symbolic Multiattribute Ordinal Knowledge-Based DSSs: methodology and Applications. Decision Sciences. 23:1357-1372.

Last Version: 1.0.2

Release Date:

ordinalStochasticDominance

nz.ac.waikato.cms.weka : ordinalStochasticDominance

An implementation of the Ordinal Stochastic Dominance Learner. Further information regarding the OSDL-algorithm can be found in: S. Lievens, B. De Baets, K. Cao-Van (2006). A Probabilistic Framework for the Design of Instance-Based Supervised Ranking Algorithms in an Ordinal Setting. Annals of Operations Research; Kim Cao-Van (2003). Supervised ranking: from semantics to algorithms; Stijn Lievens (2004). Studie en implementatie van instantie-gebaseerde algoritmen voor gesuperviseerd rangschikken

Last Version: 1.0.2

Release Date:

probabilisticSignificanceAE

nz.ac.waikato.cms.weka : probabilisticSignificanceAE

Evaluates the worth of an attribute by computing the Probabilistic Significance as a two-way function (attribute-classes and classes-attribute association). For more information see: Amir Ahmad, Lipika Dey (2004). A feature selection technique for classificatory analysis.

Last Version: 1.0.2

Release Date: