vertx-datacollector
A framework to collect and post-process data from any source.
Import
Maven
<dependency>
<groupId>info.pascalkrause</groupId>
<artifactId>vertx-datacollector</artifactId>
<version>0.0.6</version>
<scope>compile</scope>
</dependency>
Gradle
compile 'info.pascalkrause:vertx-datacollector:0.0.6'
Get Started
CollectorJob
The first step is to implement the actual collector job (e.g. crawl a dataset from a website). The collector job should be implemented in the Future which is returned by the collect() method. The Future will be executed in a seperate worker thread, which allows to have blocking operations here.
public Handler<Future<CollectorJobResult>> collect(String requestId, JsonObject feature);
After the collection step is done, it is possible to do some post-processing stuff (e.g. write result into database) in the Future which is returned by the postCollectAction method, which also can handle blocking operations.
public Handler<Future<CollectorJobResult>> postCollectAction(AsyncResult<CollectorJobResult> result);
DataCollectorServiceVerticle
After implementing the CollectorJob, the verticle can be deployed.
- ebAddress: The eventbus address
- job: The job which will be processed in the CollectorJobExecutor
- workerPoolSize: The pool size of the CollectorJobExecutor
- queueSize: The queue size of CollectorJob requests
- enableMetrics: Enables metrics for the DataCollectorService
DataCollectorServiceVerticle verticle = new DataCollectorServiceVerticle(
ebAddress, job, workerPoolSize, queueSize, enableMetrics);
vertx.deployVerticle(verticle);
DataCollectorService
When the verticle was successfully deployed, the DataCollectorService can connect to the verticle. A list of methods which are offered by the DataCollectorService can be found here.
String ebAddress = "addressOfCollectorVerticle";
DataCollectorServiceFactory factory = new DataCollectorServiceFactory(vertx, ebAddress);
DataCollectorService dcs = factory.create();
// or with DeliveryOptions
DeliveryOptions delOpts = new DeliveryOptions .....
DataCollectorService dcs = factory.create(delOpts);
DataCollectorServiceClient
The DataCollectorService is a Vert.x proxy which must stick to some restrictions to be able to translate this service also into other languages. The idea of the DataCollectorServiceClient is, having a Java client that can be used as a facade for the DataCollectorService to offer higher-value functions and do some Java specific converting e.g. error trasnformation. A list of methods which are offered by the DataCollectorServiceClient can be found here.
DataCollectorService dcs = .....
DataCollectorServiceClient dcsc = new DataCollectorServiceClient(dcs);
Architecture
JavaDoc
The latest JavaDoc can be found here.
Run tests
./gradlew test
Contribute
We are using Gerrit, so PRs in Github will probably be overlooked. Please use GerritHub.io to contribute changes. The project name is caspal/vertx-datacollector
Code Style
- Encoding must be in UTF-8.
- Change must have a commit message.
- The line endings must be LF (linux).
- The maximum length of a line should be between 80 and 120 characters.
- Use spaces instead of tabs.
- Use 4 spaces for indentation
- No trailing whitespaces.
- Avoid unnecessary empty lines.
- Adapt your code to the surroundings.
- Follow the default language style guide.
An Eclipse formatter can be found in the resources folder.