General Sequential Pattern Mining

Java implementation of GSP for frequent pattern mining

License

License

MIT
Categories

Categories

Java Languages
GroupId

GroupId

com.github.chen0040
ArtifactId

ArtifactId

java-sequential-pattern-mining
Last Version

Last Version

1.0.1
Release Date

Release Date

Type

Type

jar
Description

Description

General Sequential Pattern Mining
Java implementation of GSP for frequent pattern mining
Project URL

Project URL

https://github.com/chen0040/java-sequential-pattern-mining
Source Code Management

Source Code Management

https://github.com/chen0040/java-sequential-pattern-mining

Download java-sequential-pattern-mining

How to add to project

<!-- https://jarcasting.com/artifacts/com.github.chen0040/java-sequential-pattern-mining/ -->
<dependency>
    <groupId>com.github.chen0040</groupId>
    <artifactId>java-sequential-pattern-mining</artifactId>
    <version>1.0.1</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.chen0040/java-sequential-pattern-mining/
implementation 'com.github.chen0040:java-sequential-pattern-mining:1.0.1'
// https://jarcasting.com/artifacts/com.github.chen0040/java-sequential-pattern-mining/
implementation ("com.github.chen0040:java-sequential-pattern-mining:1.0.1")
'com.github.chen0040:java-sequential-pattern-mining:jar:1.0.1'
<dependency org="com.github.chen0040" name="java-sequential-pattern-mining" rev="1.0.1">
  <artifact name="java-sequential-pattern-mining" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.chen0040', module='java-sequential-pattern-mining', version='1.0.1')
)
libraryDependencies += "com.github.chen0040" % "java-sequential-pattern-mining" % "1.0.1"
[com.github.chen0040/java-sequential-pattern-mining "1.0.1"]

Dependencies

compile (3)

Group / Artifact Type Version
org.slf4j : slf4j-api jar 1.7.20
org.slf4j : slf4j-log4j12 jar 1.7.20
org.apache.commons : commons-math3 jar 3.2

provided (1)

Group / Artifact Type Version
org.projectlombok : lombok jar 1.16.6

test (10)

Group / Artifact Type Version
org.testng : testng jar 6.9.10
org.hamcrest : hamcrest-core jar 1.3
org.hamcrest : hamcrest-library jar 1.3
org.assertj : assertj-core jar 3.5.2
org.powermock : powermock-core jar 1.6.5
org.powermock : powermock-api-mockito jar 1.6.5
org.powermock : powermock-module-junit4 jar 1.6.5
org.powermock : powermock-module-testng jar 1.6.5
org.mockito : mockito-core jar 2.0.2-beta
org.mockito : mockito-all jar 2.0.2-beta

Project Modules

There are no modules declared in this project.

java-sequential-pattern-mining

Package provides java implementation of sequential pattern mining algorithm GSP

Build Status Coverage Status

Overview of GSP

The implementation of the algorithm is based on Srikant & Agrawal, 1996

The algorithm makes multiple passes over the data. The first pass determines the support of each item, that is, the number of data-sequences that include the item. At the end of the first pas, the algorithm knows which items are frequent, that is, have minimum support. Each such item yields a 1-element frequent sequence consisting of that item.

Each subsequent pass starts with a seed set: the frequent sequences found in the previous pass. The seed set is used to generate new potentially frequent sequences, called candidate sequences. Each candidate sequence has one more item than a seed sequence; so all the candidate sequences in a pass will have the same number of items. The support for these candidate sequences is found during the pass over the data. At the end of the pass, the algorithm determines which of the candidate sequences are actually frequent. These frequent candidates become the seed for the next pass.

Install

Add the following dependency to your POM file:

<dependency>
  <groupId>com.github.chen0040</groupId>
  <artifactId>java-sequential-pattern-mining</artifactId>
  <version>1.0.1</version>
</dependency>

Usage

The sample code belows illustrates how to use the GSP to find the frequent sequential pattern in a simple sequence database.

List<Sequence> database = new ArrayList<>();

// Below is 4 sequences of transactions stored in the database 
/*
S1 	(1), (1 2 3), (1 3), (4), (3 6)
S2 	(1 4), (3), (2 3), (1 5)
S3 	(5 6), (1 2), (4 6), (3), (2)
S4 	(5), (7), (1 6), (3), (2), (3)
*/

database.add(Sequence.make("1", "1,2,3", "1,3", "4", "3,6"));
database.add(Sequence.make("1,4", "3", "2,3", "1,5"));
database.add(Sequence.make("5,6", "1,2", "4,6", "3", "2"));
database.add(Sequence.make("5", "7", "1,6", "3", "2", "3"));

GSP method = new GSP();
method.setMinSupportLevel(2);
List<String> uniqueItems = new MetaData(database).getUniqueItems();
Sequences result = method.minePatterns(database, uniqueItems, -1);

result.getSequences().stream().forEach(sequence -> {
 System.out.println("sequence: " + sequence);
});

Versions

Version
1.0.1