edu.usc.ir:age-predictor-cli

Ensemble Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum using Apache OpenNLP, and Apache Spark.

License	License The Apache License, Version 2.0
Categories	Categories CLI User Interface
GroupId	GroupId edu.usc.ir
ArtifactId	ArtifactId age-predictor-cli
Last Version	Last Version 1.0
Release Date	Release Date 06-Jul-2017
Type	Type jar
Description	Description Ensemble Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum using Apache OpenNLP, and Apache Spark.

Download age-predictor-cli

Filename	Size
age-predictor-cli-1.0.pom
age-predictor-cli-1.0.jar	96 KB
age-predictor-cli-1.0-sources.jar	48 KB
age-predictor-cli-1.0-javadoc.jar	187 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/edu.usc.ir/age-predictor-cli/ -->
<dependency>
    <groupId>edu.usc.ir</groupId>
    <artifactId>age-predictor-cli</artifactId>
    <version>1.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/edu.usc.ir/age-predictor-cli/
implementation 'edu.usc.ir:age-predictor-cli:1.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/edu.usc.ir/age-predictor-cli/
implementation ("edu.usc.ir:age-predictor-cli:1.0")

Apache Buildr

'edu.usc.ir:age-predictor-cli:jar:1.0'

Apache Ivy

<dependency org="edu.usc.ir" name="age-predictor-cli" rev="1.0">
  <artifact name="age-predictor-cli" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='edu.usc.ir', module='age-predictor-cli', version='1.0')
)

Scala SBT

libraryDependencies += "edu.usc.ir" % "age-predictor-cli" % "1.0"

Leiningen

[edu.usc.ir/age-predictor-cli "1.0"]

Dependencies

compile (4)

Group / Artifact	Type	Version
edu.usc.ir : age-predictor-opennlp	jar	1.0
org.slf4j : slf4j-log4j12	jar	1.7.12
commons-io : commons-io	jar	2.5
org.apache.spark : spark-mllib_2.10	jar	2.0.0

test (1)

Group / Artifact	Type	Version
junit : junit	jar	4.12

Project Modules

There are no modules declared in this project.

Author Age Prediction

This is a author age categorizer that leverages the Apache OpenNLP Maximum Entropy Classifier. It takes a text sample and classifies it into the following age categories: xx-18|18-24|25-34|35-49|50-64|65-xx.

Usage

How to train an Age Classifier

Note: The training data should be a line-by-line, with each line starting with the age, or age category, followed by a tab and the text associated with the age.

Usage: bin/authorage AgeClassifyTrainer [-factory factoryName] [-featureGenerators featuregens] [-tokenizer tokenizer] -model modelFile [-params paramsFile] -lang language -data sampleData [-encoding charsetName]

Arguments description:
	-factory factoryName
        a sub-class of DoccatFactory where to get implementation and resources.
	-featureGenerators featuregens
	    comma separated feature generator classes. Bag of words default.
	-tokenizer tokenizer
        tokenizer implementation. WhitespaceTokenizer is used if not specified.
	-model modelFile
        output model file.
	-params paramsFile
	    training parameters file.
	-lang language
	    language which is being processed.
	-data sampleData
	    data to be used, usually a file name.
	-encoding charsetName
	    encoding for reading and writing text, if absent the system default is used.

Example Usage:

bin/authorage AgeClassifyTrainer -model model/en-ageClassify.bin -lang en -data data/train.txt -encoding UTF-8

Training data format - Age and text seperated by tab in each line like <AGE><Tab><TEXT>
Sample training data-

12	I am just 12 year old
25	I am little bigger
35	I am mature
45	I am getting old
60	I am old like wine

How to evaluate an Age Classifier Model

Usage: bin/authorage AgeClassifyEvaluator -model model [-misclassified true|false] -data sampleData [-encoding charsetName]

Arguments description:
	-model model
		the model file to be evaluated.
	-misclassified true|false
		if true will print false negatives and false positives.
	-data sampleData
		data to be used, usually a file name.
	-encoding charsetName
		encoding for reading and writing text, if absent the system default is used.

Example Usage:

bin/authorage AgeClassifyEvaluator -model model/en-ageClassify.bin -data data/test.txt -encoding UTF-8

How to run the Age Classifier

Note: Each document must be followed by an empty line to be detected as a separate case from the others.

Usage: bin/authorage AgeClassify model < documents

Usage: bin/authorage AgePredict ./model/classify-unigram.bin ./model/regression-global.bin  data/sample_test.txt

Downloads

For AgePredict to work you need to download en-pos-maxent.bin, en-sent.bin and en-token.bin from http://opennlp.sourceforge.net/models-1.5/ to model/opennlp/

Citation:

If you use this work, please cite:

@article{hong2017ensemble,
  title={Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction},
  author={Hong, Joey and Mattmann, Chris and Ramirez, Paul},
  booktitle={Information Reuse and Integration (IRI), 2017 IEEE 18th International Conference on},
  organization={IEEE}
  year={2017}
}

Contributors

Chris A. Mattmann, JPL & USC
Joey Hong, Caltech
Madhav Sharan, JPL & USC

License

Apache License, version 2

USC Information Retrieval & Data Science

USC Information Retrieval and Data Science Group

Versions

Version
1.0 06-Jul-2017

edu.usc.ir:age-predictor-cli

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Download age-predictor-cli

How to add to project

Dependencies

compile (4)

test (1)

Project Modules

Author Age Prediction

Usage

How to train an Age Classifier

How to evaluate an Age Classifier Model

How to run the Age Classifier

Downloads

Citation:

Contributors

License

USC Information Retrieval & Data Science

Versions