GTF parser for Java Dataframes
A GTF Reader and Writer for Java DataFrames.
The GTF Format is implemented according to this documentation:
Documentation
Install
Add this to you pom.xml
<dependencies>
...
<dependency>
<groupId>de.unknownreality</groupId>
<artifactId>dataframe-gtf</artifactId>
<version>0.2.4</version>
</dependency>
...
</dependencies>
Build
To build the library from sources:
-
Clone github repository
$ git clone https://github.com/nRo/DataFrame-GTF.git
-
Change to the created folder and run
mvn install$ cd DataFrame-GTF
$ mvn install
-
Include it by adding the following to your project's
pom.xml:
<dependencies>
...
<dependency>
<groupId>de.unknownreality</groupId>
<artifactId>dataframe-gtf</artifactId>
<version>0.2.4-SNAPSHOT</version>
</dependency>
...
</dependencies>
Usage
Create a DataFrame from a GTF file
File gtfFile = new File("genome.gtf");
DataFrame df = DataFrame.load(gtfFile,GTFFormat.GTF)
Per default, all GTF fields are included in the resulting DataFrame. Attributes can be added by adding them to the GTF reader.
GTFReader gtfReader = GTFReaderBuilder.create()
.withAttribute("gene_id")
.build();
DataFrame df = DataFrame.load(gtfFile, gtfReader);
The column type of GTF fields is predefined:
| GTF field | type |
|---|---|
| seqname | String |
| source | String |
| feature | String |
| start | Long |
| end | Long |
| score | Double |
| strand | String |
| frame | Integer |
The type of attributes can be specified
GTFReader gtfReader = GTFReaderBuilder.create()
.withAttribute("gene_id")
.withAttribute("test_value", DoubleColumn.class)
.build();
DataFrame df = DataFrame.load(gtfFile, gtfReader);
DataFrames can be written according to the GTF format.
dataFrame.write(new File("result.gtf"), GTFFormat.GTF);