Zipkin Spark Streaming

Zipkin Spark Streaming (Parent)

License	License The Apache Software License, Version 2.0
Categories	Categories Zipkin Application Testing & Monitoring Application Performance Monitoring (APM)
GroupId	GroupId io.zipkin.sparkstreaming
ArtifactId	ArtifactId zipkin-sparkstreaming
Last Version	Last Version 0.3.9
Release Date	Release Date 23-Feb-2018
Type	Type jar
Description	Description Zipkin Spark Streaming Zipkin Spark Streaming (Parent)
Project Organization	Project Organization OpenZipkin

Download zipkin-sparkstreaming

Filename	Size
zipkin-sparkstreaming-0.3.9.pom
zipkin-sparkstreaming-0.3.9.jar	22 KB
zipkin-sparkstreaming-0.3.9-sources.jar	16 KB
zipkin-sparkstreaming-0.3.9-javadoc.jar	45 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/io.zipkin.sparkstreaming/zipkin-sparkstreaming/ -->
<dependency>
    <groupId>io.zipkin.sparkstreaming</groupId>
    <artifactId>zipkin-sparkstreaming</artifactId>
    <version>0.3.9</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/io.zipkin.sparkstreaming/zipkin-sparkstreaming/
implementation 'io.zipkin.sparkstreaming:zipkin-sparkstreaming:0.3.9'

Gradle Kotlin

// https://jarcasting.com/artifacts/io.zipkin.sparkstreaming/zipkin-sparkstreaming/
implementation ("io.zipkin.sparkstreaming:zipkin-sparkstreaming:0.3.9")

Apache Buildr

'io.zipkin.sparkstreaming:zipkin-sparkstreaming:jar:0.3.9'

Apache Ivy

<dependency org="io.zipkin.sparkstreaming" name="zipkin-sparkstreaming" rev="0.3.9">
  <artifact name="zipkin-sparkstreaming" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='io.zipkin.sparkstreaming', module='zipkin-sparkstreaming', version='0.3.9')
)

Scala SBT

libraryDependencies += "io.zipkin.sparkstreaming" % "zipkin-sparkstreaming" % "0.3.9"

Leiningen

[io.zipkin.sparkstreaming/zipkin-sparkstreaming "0.3.9"]

Dependencies

compile (5)

Group / Artifact	Type	Version
org.apache.spark : spark-core_2.10	jar	1.6.3
org.apache.spark : spark-streaming_2.10	jar	1.6.3
io.zipkin.java : zipkin	jar	2.5.0
org.slf4j : slf4j-log4j12	jar	1.7.25
log4j : log4j	jar	1.2.17

provided (1)

Group / Artifact	Type	Version
com.google.auto.value : auto-value	jar	1.5.3

test (3)

Group / Artifact	Type	Version
junit : junit	jar	4.12
org.assertj : assertj-core	jar	3.9.0
io.zipkin.java : zipkin	test-jar	2.5.0

Project Modules

There are no modules declared in this project.

zipkin-sparkstreaming

This is a streaming alternative to Zipkin's collector.

Zipkin's collector receives span messages reported by applications, or via Kafka. It does very little besides storing them for later query, and there are limited options for downsampling or otherwise.

This project provides a more flexible pipeline, including the ability to

receive spans from other sources, like files
perform dynamic sampling, like retain only latent or error traces
process data in real-time, like reporting or alternate visualization tools
adjust data, like scrubbing private data or normalizing service names

Status

Many features are incomplete. Please join us to help complete them.

Usage

The quickest way to get started is to fetch the latest released job as a self-contained executable jar. Note that the Zipkin Spark Streaming Job requires minimum JRE 7. For example:

Download the latest job

The following downloads the latest version using wget:

wget -O zipkin-sparkstreaming-job.jar 'https://search.maven.org/remote_content?g=io.zipkin.sparkstreaming&a=zipkin-sparkstreaming-job&v=LATEST'

Run the job

You can either run the job in local or cluster mode. Here's an example of each:

# run local
java -jar zipkin-sparkstreaming-job.jar \
  --zipkin.log-level=debug \
  --zipkin.storage.type=elasticsearch \
  --zipkin.storage.elasticsearch.hosts=http://127.0.0.1:9200 \
  --zipkin.sparkstreaming.stream.kafka.bootstrap-servers=127.0.0.1:9092
# run in a cluster
java -jar zipkin-sparkstreaming-job.jar \
  --zipkin.log-level=debug \
  --zipkin.storage.type=elasticsearch \
  --zipkin.storage.elasticsearch.hosts=http://127.0.0.1:9200 \
  --zipkin.sparkstreaming.stream.kafka.bootstrap-servers=127.0.0.1:9092 \
  --zipkin.sparkstreaming.master=spark://127.0.0.1:7077

Key Components

The image below shows the internal architecture of zipkin spark streaming job. StreamFactory is a extensible interface that ingests data from Kafka or any other transport. The filtering step filters spans based on criteria like service name(#33). The aggregation phase groups the spans by time or trace ID. The adjuster phase is useful for making adjustments to spans that belong to the same trace. For example, the FinagleAdjuster fixes known bugs in the old finagle zipkin tracer. The final consumer stage persists the data to a storage system like ElasticSearch service.

                                        ┌────────────────────────────┐   
                                        │           Kafka            │   
                                        └────────────────────────────┘   
                                      ┌────────────────┼────────────────┐
                                      │                ▼                │
                                      │  ┌────────────────────────────┐ │
                                      │  │       StreamFactory        │ │
                                      │  └────────────────────────────┘ │
                                      │                 │               │
                                      │                 ▼               │
                                      │  ┌────────────────────────────┐ │
                                      │  │         Filtering          │ │
                                      │  └────────────────────────────┘ │
                                      │                 │               │
                                      │                 ▼               │
                                      │  ┌────────────────────────────┐ │
                                      │  │        Aggregation         │ │
                                      │  └────────────────────────────┘ │
                                      │                 │               │
                                      │                 ▼               │
                                      │  ┌────────────────────────────┐ │
                                      │  │          Adjuster          │ │
                                      │  └────────────────────────────┘ │
                                      │                 │               │
                                      │                 ▼               │
                                      │  ┌────────────────────────────┐ │
                                      │  │          Consumer          │ │
                                      │  └────────────────────────────┘ │
                                      └─────────────────┼───────────────┘
                                                        ▼                
                                           ┌──────────────────────────┐  
                                           │ Storage (ES, Cassandra)  │  
                                           └──────────────────────────┘

Stream

A stream is a source of json or thrift encoded span messages.

For example, a message stream could be a Kafka topic named "zipkin"

Stream	Description
Kafka	Ingests spans from a Kafka topic.

Adjuster

An adjuster conditionally changes spans sharing the same trace ID.

You can make adjusters to fixup data reported by instrumentation, or to scrub private data. This example shows how to add a custom adjuster to the spark job.

Below is the list of prepackaged adjusters.

Adjuster	Description
Finagle	Fixes up spans reported by Finagle.

Consumer

A consumer is an end-recipient of potentially adjusted spans sharing the same trace ID.

This could be a Zipkin storage component, like Elasticsearch, or another sink, such as a streaming visualization tool.

Consumer	Description
Storage	Writes spans to a Zipkin Storage Component

Open Zipkin

Versions

Version
0.3.9 23-Feb-2018
0.3.8 09-Feb-2018
0.3.7 26-Jan-2018
0.3.6 11-Nov-2017
0.3.3 11-Jul-2017
0.3.2 22-Apr-2017
0.3.1 22-Apr-2017
0.3.0 09-Mar-2017
0.2.0 28-Feb-2017
0.1.0 14-Feb-2017

Zipkin Spark Streaming

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project Organization