Data Highway
Start using
Overview
What is Data Highway?
The Data Highway is a service that allows data to be easily produced and consumed via JSON messages over HTTPS/WSS. Data is first defined using a schema and a "road" is created which will accept messages that conform to this schema. Producers of data sets thus only need to define the structure of their data and are then able to send their data to a REST endpoint and not be concerned with what happens next. Data Highway will ensure that this data is made available for streaming consumption and also stored reliably in a "data lake" in the cloud for access by end users.
Architecture
Paver
Paver is Data Highway's administration endpoint. It provides the following features:
- Road (Synonymous with Kafka topic) creation.
- Schema registration and (soft) deletion.
- Data-at-rest to Hive/S3 configuration.
- Road-level producer and consumer authorisation.
Onramp
Onramp is Data Highway's producer endpoint. It allows users to submit messages to roads in JSON format over HTTPS.
Offramp
Offramp is Data Highway's consumer endpoint. It allows users to consume message from roads in JSON format over WSS.
Tollbooth
Tollbooth is the core of Data Highway. It provides the mechanism by which mutations to a road's model are persisted. Mutations can come from users (Paver) or internal agents. Anything wishing to make a mutation submit's a JSON Patch onto a deltas Kafka topic. Tollbooth consumes this topic, continuously applying patches to models and persisting them back onto the main Model (compact) topic.
Traffic Control
Traffic Control is the Kafka Agent. It is primarily responsible for managing Kafka topics in response to changes in models.
Loading Bay / Truck Park
Loading Bay is responsible for orchestrating the landing of data to S3 on a configured interval and managing Hive tables - creation, schema mutation and the addition of partitions.
Try it out
Try Test Drive, an in-memory version of Data Highway that exposes all the public facing endpoints in a single Spring Boot application or Docker container.
docker run -p 8080:8080 hotelsdotcom/road-test-drive:<tag>
Examples
Using a local instance of Test Drive, try creating road, registering a schema and producing and consuming messages using the build in user account user:pass
.
Note: For the example below, cURL will prompt for a password which is pass
.
Create a road
curl -sk \
-u user \
-X POST \
-H "Content-Type: application/json" \
-d '{
"name": "my_road",
"description": "My Road",
"teamName": "TEAM",
"contactEmail": "[email protected]",
"partitionPath": "$.foo",
"enabled": true,
"authorisation": {
"onramp": {
"cidrBlocks": ["0.0.0.0/0"],
"authorities": ["*"]
},
"offramp": {
"authorities": {
"*": ["PUBLIC"]
}
}
}
}' https://localhost:8080/paver/v1/roads
Register a schema
curl -sk \
-u user\
-X POST \
-H "Content-Type: application/json" \
-d '{
"type" : "record",
"name" : "my_record",
"fields" : [
{"name":"foo","type":"string"},
{"name":"bar","type":"string"}
]
}' https://localhost:8080/paver/v1/roads/my_road/schemas
Produce messages
curl -sk \
-u user\
-H "Content-Type: application/json" \
-d '[{"foo":"foo1","bar":"bar1"}]' \
https://localhost:8080/onramp/v1/roads/my_road/messages
Consume messages
echo '{"type":"REQUEST","count":1}' |\
websocat -nk wss://localhost:8080/offramp/v2/roads/my_road/streams/my_stream/messages?defaultOffset=EARLIEST
See: websocat
Building
Build and load docker images to the local docker daemon:
mvn clean package -Djib.goal=dockerBuild
Build without docker images:
mvn clean package -Djib.skip
Build and push docker images to a repo:
mvn clean package -Ddocker.repo=my.docker.repo
Contributors
Special thanks to the following for making data-highway possible!
Kyriakos Sideris |
Sandeep Solanki |
---|
This project follows the all-contributors specification.
Legal
This project is available under the Apache 2.0 License.
Copyright 2019 Expedia, Inc.