com.coxautodata:waimak-azure-table_2.12

License	License Apache License, Version 2.0
Categories	Categories Data Auto Application Layer Libs Code Generators
GroupId	GroupId com.coxautodata
ArtifactId	ArtifactId waimak-azure-table_2.12
Last Version	Last Version 2.6
Release Date	Release Date Jun 6, 2019
Type	Type jar
Description	Description Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark. Copyright 2018 Cox Automotive UK Limited; Licensed under the Apache License, Version 2.0.

Download waimak-azure-table_2.12

Filename	Size
waimak-azure-table_2.12-2.6.pom
waimak-azure-table_2.12-2.6.jar	47 KB
waimak-azure-table_2.12-2.6-tests.jar	32 KB
waimak-azure-table_2.12-2.6-test-sources.jar	3 KB
waimak-azure-table_2.12-2.6-sources.jar	10 KB
waimak-azure-table_2.12-2.6-javadoc.jar	1 MB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.coxautodata/waimak-azure-table_2.12/ -->
<dependency>
    <groupId>com.coxautodata</groupId>
    <artifactId>waimak-azure-table_2.12</artifactId>
    <version>2.6</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.coxautodata/waimak-azure-table_2.12/
implementation 'com.coxautodata:waimak-azure-table_2.12:2.6'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.coxautodata/waimak-azure-table_2.12/
implementation ("com.coxautodata:waimak-azure-table_2.12:2.6")

Apache Buildr

'com.coxautodata:waimak-azure-table_2.12:jar:2.6'

Apache Ivy

<dependency org="com.coxautodata" name="waimak-azure-table_2.12" rev="2.6">
  <artifact name="waimak-azure-table_2.12" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.coxautodata', module='waimak-azure-table_2.12', version='2.6')
)

Scala SBT

libraryDependencies += "com.coxautodata" % "waimak-azure-table_2.12" % "2.6"

Leiningen

[com.coxautodata/waimak-azure-table_2.12 "2.6"]

Dependencies

compile (2)

Group / Artifact	Type	Version
com.coxautodata : waimak-core_2.12	jar	2.6
com.microsoft.azure : azure-storage	jar	6.1.0

provided (5)

Group / Artifact	Type	Version
org.scala-lang : scala-library	jar	${scala.lang.version}
org.scala-lang : scala-compiler	jar	${scala.lang.version}
org.apache.spark » spark-core_${scala.compatible.version} Optional	jar	${spark.version}
org.apache.spark » spark-sql_${scala.compatible.version} Optional	jar	${spark.version}
org.apache.spark » spark-hive_${scala.compatible.version} Optional	jar	${spark.version}

test (3)

Group / Artifact	Type	Version
com.coxautodata : waimak-core_2.12	test-jar	2.6
org.scalatest » scalatest_${scala.compatible.version}	jar	3.0.3
org.apache.spark » spark-sql_${scala.compatible.version} Optional	test-jar	${spark.version}

Project Modules

There are no modules declared in this project.

Waimak

What is Waimak?

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Waimak aims to abstract the more complex parts of Spark application development (such as orchestration) away from the business logic, allowing users to get their business logic in a production-ready state much faster. By using a framework written by Data Engineers, the teams defining the business logic can write and own their production code.

Our metaphor to describe this framework is the braided river – it splits and rejoins to itself repeatedly on its journey. By describing a Spark application as a sequence of flow transformations, Waimak can execute independent branches of the flow in parallel making more efficient use of compute resources and greatly reducing the execution time of complex flows.

Versions

From Waimak version 2.9 onwards we will be focusing our attention on Spark 3+ and Scala 2.12. While some of the jars will still be published for 2.11 we are focusing on having the build be 2.12 first. This is due to an increasing number of the dependencies (databricks config and deequ for example) now being produced for Scala 2.12. If you still require Scala 2.11 support for new versions please open an issue and we can work with you on this.

Why would I use Waimak?

We developed Waimak to:

allow teams to own their own business logic without owning an entire production Spark application
reduce the time it takes to write production-ready Spark applications
provide an intuitive structure to Spark applications by describing them as a sequence of transformations forming a flow
increase the performance of Spark data flows by making more efficient use of the Spark executors

Importantly, Waimak is a framework for building Spark applications by describing a sequence of composed Spark transformations. To create those transformations Waimak exposes the complete Spark API, giving you the power of Apache Spark with added structure.

How do I get started?

You can import Waimak into your Maven project using the following dependency details:

        <dependency>
            <groupId>com.coxautodata</groupId>
            <artifactId>waimak-core_2.11</artifactId>
            <version>${waimak.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>

Waimak marks the Spark dependency as optional so as not to depend on any specific release of Spark, therefore you must specify the version of Spark you wish to use as a dependency. Waimak should run on any version of Spark 2.2+, however the list of officially tested versions is given below.

The following code snippet demonstrates a basic Waimak example taken from the unit tests:

// Required imports
import com.coxautodata.waimak.dataflow.Waimak

// Initialise basic Waimak objects
val emptyFlow = Waimak.sparkFlow(spark)

// Add actions to the flow
val basicFlow = emptyFlow
    .openCSV(basePath)("csv_1", "csv_2")
    .alias("csv_1", "items")
    .alias("csv_2", "person")
    .writeParquet(baseDest)("items", "person")

// Run the flow
basicFlow.execute()

This example is very small, but in practice flow definitions can become very large depending of the number of inputs and outputs in a job.

The project wiki page provides best practices for structuring your project when dealing with large flows.

What Waimak modules are available?

Waimak currently consists of the following modules:

Artifact ID	Purpose	Maven Release
`waimak-core`	Core Waimak functionality and generic actions	Maven Central
`waimak-configuration-databricks`	Databricks-specific configuration provider using secret scopes (Scala 2.11 only)	Maven Central
`waimak-impala`	Impala implementation of the `HadoopDBConnector` used for commiting labels to an Impala DB	Maven Central
`waimak-hive`	Hive implementation of the `HadoopDBConnector` used for commiting labels to a Hive Metastore	Maven Central
`waimak-rdbm-ingestion`	Functionality to ingest inputs from a range of RDBM sources	Maven Central
`waimak-storage`	Functionality for providing a hot/cold region-based ingestion storage layer	Maven Central
`waimak-app`	Functionality providing Waimak application templates and orchestration	Maven Central
`waimak-experimental`	Experimental features currently under development	Maven Central
`waimak-dataquality`	Functionality for monitoring and alerting on data quality	Maven Central
`waimak-deequ`	Amazon Deequ implementation of data quality monitoring (Scala 2.11 only)	Maven Central

What versions of Spark are supported?

Waimak is tested against the following versions of Spark:

Package Maintainer	Spark Version	Scala Version
Apache Spark	2.4.5	2.11
Apache Spark	3.0.1	2.12

Other versions of Spark >= 2.2 are also likely to work and can be added to the list of tested versions if there is sufficient need.

Where can I learn more?

You can find the latest documentation for Waimak on the project wiki page. This README file contains basic setup instructions and general project information.

You can also find details of what's in the latest releases in the changelog.

Finally, you can also talk to the developers and other users directly at our Gitter room.

Can I contribute to Waimak?

We welcome all users to contribute to the development of Waimak by raising pull-requests. We kindly ask that you include suitable unit tests along with proposed changes.

How do I test my contributions?

Waimak is tested against different versions of Spark 2.x to ensure uniform compatibility. The versions of Spark tested by Waimak are given in the <profiles> section of the POM. You can activate a given profile in the POM by using the -P flag: mvn clean package -P apache-2.3.0_2.11

The integration tests of the RDBM ingestion module require Docker therefore you must have the Docker service running and the current user must be able to access the Docker service.

What is Waimak licensed under?

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Cox Automotive Data Solutions

Versions

Version
2.6 Jun 6, 2019
2.5 May 29, 2019
2.4.2 May 21, 2019
2.4.1 May 14, 2019
2.4 May 9, 2019
2.3.1 Apr 29, 2019
2.3 Apr 9, 2019
2.2 Mar 29, 2019
2.1.1 Mar 15, 2019
2.1 Mar 8, 2019
2.0 Dec 17, 2018

com.coxautodata:waimak-azure-table_2.12

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Download waimak-azure-table_2.12

How to add to project

Dependencies

compile (2)

provided (5)

test (3)

Project Modules

Waimak

What is Waimak?

Versions

Why would I use Waimak?

How do I get started?

What Waimak modules are available?

What versions of Spark are supported?

Where can I learn more?

Can I contribute to Waimak?

How do I test my contributions?

What is Waimak licensed under?

Cox Automotive Data Solutions

Versions