Patch for using TagSoup with Semargl

NxParser is a Java open source, streaming, non-validating parser for the Nx format, where x = Triples, Quads, or any other number.

License

License

Categories

Categories

Ant Build Tools
GroupId

GroupId

org.semanticweb.yars
ArtifactId

ArtifactId

nxparser-parsers-external-rdfa-semargl-tagsoup
Last Version

Last Version

4.0.0
Release Date

Release Date

Type

Type

jar
Description

Description

Patch for using TagSoup with Semargl
NxParser is a Java open source, streaming, non-validating parser for the Nx format, where x = Triples, Quads, or any other number.

Download nxparser-parsers-external-rdfa-semargl-tagsoup

How to add to project

<!-- https://jarcasting.com/artifacts/org.semanticweb.yars/nxparser-parsers-external-rdfa-semargl-tagsoup/ -->
<dependency>
    <groupId>org.semanticweb.yars</groupId>
    <artifactId>nxparser-parsers-external-rdfa-semargl-tagsoup</artifactId>
    <version>4.0.0</version>
</dependency>
// https://jarcasting.com/artifacts/org.semanticweb.yars/nxparser-parsers-external-rdfa-semargl-tagsoup/
implementation 'org.semanticweb.yars:nxparser-parsers-external-rdfa-semargl-tagsoup:4.0.0'
// https://jarcasting.com/artifacts/org.semanticweb.yars/nxparser-parsers-external-rdfa-semargl-tagsoup/
implementation ("org.semanticweb.yars:nxparser-parsers-external-rdfa-semargl-tagsoup:4.0.0")
'org.semanticweb.yars:nxparser-parsers-external-rdfa-semargl-tagsoup:jar:4.0.0'
<dependency org="org.semanticweb.yars" name="nxparser-parsers-external-rdfa-semargl-tagsoup" rev="4.0.0">
  <artifact name="nxparser-parsers-external-rdfa-semargl-tagsoup" type="jar" />
</dependency>
@Grapes(
@Grab(group='org.semanticweb.yars', module='nxparser-parsers-external-rdfa-semargl-tagsoup', version='4.0.0')
)
libraryDependencies += "org.semanticweb.yars" % "nxparser-parsers-external-rdfa-semargl-tagsoup" % "4.0.0"
[org.semanticweb.yars/nxparser-parsers-external-rdfa-semargl-tagsoup "4.0.0"]

Dependencies

compile (1)

Group / Artifact Type Version
org.ccil.cowan.tagsoup : tagsoup jar 1.2.1

test (1)

Group / Artifact Type Version
junit : junit jar 4.13.1

Project Modules

There are no modules declared in this project.

Welcome to NxParser

NxParser is a Java open source, streaming, non-validating parser for the Nx format, where x = Triples, Quads, or any other number. For more details see the specification for the NQuads format, a extension for the N-Triples RDF format. Note that the parser handles any combination (cf. generalised triples) or number of N-Triples syntax terms on each line (the number of terms per line can also vary).

It ate 2 mil. quads (~4GB, (~240MB GZIPped)) on a T60p (Win7, 2.16 GHz) in ~1 min 35 s (1:18min). Overall, it's more than twice as fast as the previous version when it comes to reading Nx.

The NxParser is non-validating, meaning that, e.g., it will happily eat non-conformant N-Triples. Also, the NxParser will not parse certain valid N-Triples files where the RDF terms are not separated by whitespace. We pass all positive W3C N-Triples test cases except one, where the RDF terms are not separated by whitespace (surprise!).

Other formats

The NxParser Parser family also includes a RDF/XML and a Turtle parser. Moreover, we attached a JSON-LD parser (jsonld-java) and a RDFa parser (semargl) such that they emit Triples in the NxParser API.

Binaries

Compiles are available on Maven Central. The groupId is org.semanticweb.yars. Depending on what part you need, you have to choose the artifactId accordingly: For example, if you only want to use the data model, use nxparser-model. If you want to make use of the parsers, use nxparser-parsers. If you want to use the RDF support for JAX-RS, use nxparser-jax-rs. The modules are linked as required.

<dependency>
  <groupId>org.semanticweb.yars</groupId>
  <artifactId>nxparser-parsers</artifactId>
  <version>2.3.3</version>
</dependency>

Legacy binaries

Find old compiles in the repository on Google Code, which we do not maintain any more. To use it nevertheless, add

<repository>
 <id>nxparser-repo</id>
 <url>
  http://nxparser.googlecode.com/svn/repository
 </url>
</repository>
<repository>
 <id>nxparser-snapshots</id>
 <url>
  http://nxparser.googlecode.com/svn/snapshots
 </url>
</repository>

to your pom.xml.

Code Examples

Read Nx from a file

FileInputStream is = new FileInputStream("path/to/file.nq");

NxParser nxp = new NxParser();
nxp.parse(is);

for (Node[] nx : nxp)
  // prints the subject, eg. <http://example.org/>
  System.out.println(nx[0]);

Use a blank node

// true means you are supplying proper N-Triples RDF terms that do not need to be processed
Resource subjRes = new Resource("<http://example.org/123>", true);
Resource predRes = new Resource("<http://example.org/123>", true);
BNode bn = new BNode("_:bnodeId", true);

Node[] triple = new Node[]{subjRes, predRes, bn};
// yields <http://example.org/123> <http://example.org/123> _:bnodeId
System.out.println(Arrays.toString(triple));

Use Unicode-characters

String japaneseString = ("祝福は、チーズのメーカーです。");
Literal japaneseLiteral = new Literal(japaneseString, "ja");

// yields "\u795D\u798F\u306F\u3001\u30C1\u30FC\u30BA\u306E\u30E1\u30FC\u30AB\u30FC\u3067\u3059\u3002"@ja
System.out.println(japaneseLiteral);

// yields 祝福は、チーズのメーカーです。
System.out.println(japaneseLiteral.getLabel());

Use datatyped literals

Example: Get a Calendar object from an xsd:dateTime-typed Literal

Literal dtl; // parser-generated
XSDDateTime dt = (XSDDateTime)DatatypeFactory.getDatatype(dtl); 
GregorianCalendar cal = dt.getValue();

Use from Python

Provided you use the Jython implementation (thanks to Uldis Bojars, this is saved from his now offline blog).

import sys
sys.path.append("./nxparser.jar")
	 
from org.semanticweb.yars.nx.parser import *
from java.io import FileInputStream
from java.util.zip import GZIPInputStream
	 
def all_triples(fname, use_gzip=False):
  in_file = FileInputStream(fname)
  if use_gzip:
      in_file = GZIPInputStream(in_file)
	 
  nxp = NxParser()
  nxp.parse(in_file)
	 
  while nxp.hasNext():
    triple = nxp.next()
    n3 = ([i.toString() for i in triple])
    yield n3

The code above defines a generator function which will yield a stream of NQuad records. We can now add some demo code in order to see it in action:

def main():
  gzfname = "sioc-btc-2009.gz"
 
  for line in all_triples(gzfname, use_gzip=True):
    print line
	 
  if __name__ == "__main__":
    main()

results in:

[u'<http://2008.blogtalk.net/node/29>', u'<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', u'<http://rdfs.org/sioc/ns#Post>', u'<http://2008.blogtalk.net/sioc/node/29>']
[u'<http://2008.blogtalk.net/node/65>', u'<http://rdfs.org/sioc/ns#content>', u'"We\'ve created a map showing the main places of interest (event locations, restaurants, pubs, shopping locations and tourist sights) during BlogTalk 2008.  The conference venue is shown on the left-hand side of the map.  We will also have a hardcopy for all attendees. View Larger Map"', u'<http://2008.blogtalk.net/sioc/node/65>']

issues with Eclipse

we had an issue with eclipse not being able to create his folder structure for nxparser-parsers, mvn eclipse:eclipse did the trick.

Versions

Version
4.0.0
3.0.1
3.0.0