License |
License |
---|---|
Categories |
CategoriesPDF Data |
GroupId | GroupIdde.cit-ec.scie |
ArtifactId | ArtifactIdpdf-extractor |
Last Version | Last Version2.0.1 |
Release Date | Release Date |
Type | Typejar |
Description |
DescriptionSCIE PDF Text Extractor
This is an optimized version of Apache PDFBox. It allows
to extract the rough structure of a document (pages, blocks of text and
paragraphs as well as formatting information) and was made with the
intent to optimize text extraction results for scientific papers.
The output can easily be transformed to plaintext (toString) or to
an XML format (toXML).
|
Project URL |
Project URL |
Source Code Management |
Source Code Management |
Filename | Size |
---|---|
pdf-extractor-2.0.1.pom | |
pdf-extractor-2.0.1.jar | 33 KB |
pdf-extractor-2.0.1-sources.jar | 32 KB |
pdf-extractor-2.0.1-javadoc.jar | 127 KB |
Browse |
<!-- https://jarcasting.com/artifacts/de.cit-ec.scie/pdf-extractor/ -->
<dependency>
<groupId>de.cit-ec.scie</groupId>
<artifactId>pdf-extractor</artifactId>
<version>2.0.1</version>
</dependency>
// https://jarcasting.com/artifacts/de.cit-ec.scie/pdf-extractor/
implementation 'de.cit-ec.scie:pdf-extractor:2.0.1'
// https://jarcasting.com/artifacts/de.cit-ec.scie/pdf-extractor/
implementation ("de.cit-ec.scie:pdf-extractor:2.0.1")
'de.cit-ec.scie:pdf-extractor:jar:2.0.1'
<dependency org="de.cit-ec.scie" name="pdf-extractor" rev="2.0.1">
<artifact name="pdf-extractor" type="jar" />
</dependency>
@Grapes(
@Grab(group='de.cit-ec.scie', module='pdf-extractor', version='2.0.1')
)
libraryDependencies += "de.cit-ec.scie" % "pdf-extractor" % "2.0.1"
[de.cit-ec.scie/pdf-extractor "2.0.1"]
Group / Artifact | Type | Version |
---|---|---|
org.apache.pdfbox : pdfbox | jar | 1.8.2 |
Group / Artifact | Type | Version |
---|---|---|
junit : junit | jar | 4.11 |