For example, ZeroR's model just consists of a single value: The specific form and creation of this mapping, or model, differs from classifier to classifier. Surprisingly little is needed for a basic classifier:Ī routine which generates a classifier model from a training dataset (= buildClassifier) and another routine which produces a classification for a given instance (= classifyInstance), or generates a probability distribution for all classes of the instance (= distributionForInstance).Ī classifier model is an arbitrary complex mapping from predictor attributes to the class attribute.
Java 45Loader c45_filestem > data.arffĪny classification or regression algorithm in WEKA is derived from the abstract Classifier class.
re offers some other useful routines, e.g., converters.C45Loader and converters.CSVLoader, which can be used to convert C45 datasets and comma/tab-separated datasets respectively, e.g.: java data.csv > data.arff Some basic statistics and validation of given ARFF files can be obtained via the main() routine of : java data/soybean.arff rest of the dataset consists of the token followed by comma-separated values for the attributes - one line per example. In our case it is a nominal attribute with two values, making this a binary classification problem. The last attribute is the default target or class variable used for prediction. Here we state the internal name of the dataset. % Any relation to real weather is purely coincidental.Ĭomment lines at the beginning of the dataset should give an indication of its source, context and meaning. % This is a toy example, the UCI weather dataset. A complete description of the ARFF file format can be found here. The external representation of an Instances class is an ARFF file, which consists of a header describing the attribute types and the data as comma-separated list. WEKA also supports date attributes and relational attributes. Each Instance consists of a number of attributes, any of which can be nominal (= one of a predefined list of values), numeric (= a real or integer number) or a string (= an arbitrary long list of characters, enclosed in "double quotes"). A dataset is a collection of examples, each one of class Instance. In WEKA, it is implemented by the Instances class. A dataset is roughly equivalent to a two-dimensional spreadsheet or database table. Basic concepts DatasetĪ set of data items, the dataset, is a very basic concept of machine learning. If you want to know exactly what is going on, take a look at the source code, which can be found in weka-src.jar and can be extracted via the jar utility from the Java Development Kit. Prepare to use it since this introduction is not intended to be complete. Note that, in the doc directory of the WEKA installation directory, you can find documentation of all Java classes in WEKA. Afterwards, some practical examples are given.
Following that, we will consider some machine learning algorithms that generate classification models.
Then, we will describe the weka.filters package, which is used to transform input data, e.g., for preprocessing, transformation, feature generation and so on. We will begin by describing basic concepts and ideas. This document serves as a brief introduction to using WEKA from the command line interface.
#WEKA JAR ONLY INCLUDE FILES YOU NEED SERIES#
Regression, association rule mining, time series prediction, and clustering algorithms have also been implemented. Its main strengths lie in the classification area, where many of the main machine learning approaches have been implemented within a clean, object-oriented Java class hierarchy. WEKA is a comprehensive workbench for machine learning and data mining.