otl-agent

A simple light-weight opentracing framework implementation

License

License

GroupId

GroupId

io.opns.otl
ArtifactId

ArtifactId

otl-agent
Last Version

Last Version

0.2.16-alpha
Release Date

Release Date

Type

Type

jar
Description

Description

otl-agent
A simple light-weight opentracing framework implementation
Source Code Management

Source Code Management

https://github.com/sudiptasish/opentracing-lite/tree/master/otl-agent

Download otl-agent

How to add to project

<!-- https://jarcasting.com/artifacts/io.opns.otl/otl-agent/ -->
<dependency>
    <groupId>io.opns.otl</groupId>
    <artifactId>otl-agent</artifactId>
    <version>0.2.16-alpha</version>
</dependency>
// https://jarcasting.com/artifacts/io.opns.otl/otl-agent/
implementation 'io.opns.otl:otl-agent:0.2.16-alpha'
// https://jarcasting.com/artifacts/io.opns.otl/otl-agent/
implementation ("io.opns.otl:otl-agent:0.2.16-alpha")
'io.opns.otl:otl-agent:jar:0.2.16-alpha'
<dependency org="io.opns.otl" name="otl-agent" rev="0.2.16-alpha">
  <artifact name="otl-agent" type="jar" />
</dependency>
@Grapes(
@Grab(group='io.opns.otl', module='otl-agent', version='0.2.16-alpha')
)
libraryDependencies += "io.opns.otl" % "otl-agent" % "0.2.16-alpha"
[io.opns.otl/otl-agent "0.2.16-alpha"]

Dependencies

compile (2)

Group / Artifact Type Version
io.opentracing : opentracing-api jar 0.33.0
io.opentracing : opentracing-util jar 0.33.0

test (2)

Group / Artifact Type Version
org.mockito : mockito-all jar 1.9.5
org.junit.jupiter : junit-jupiter-api jar 5.6.1

Project Modules

There are no modules declared in this project.

Opentracing-LiTe

Opentracing-LiTe is a distributed tracing framework built on the specification of https://opentracing.io. Those who want to learn more about opentracing, please refer to link https://opentracing.io/docs/overview/what-is-tracing/.

Background

In a micro service based architecture, often there is a requirement to trace the request flow. Consider the below usecase: There are two services: Employee Service and Department Service.

Here is the request flow:

  1. User tries to create an employee along with the department (s)he belongs to.
  2. It invokes the employee service end point.
  3. Employee service creates the employee.
  4. As part of the flow, invokes Department service to create the department.
  5. Department service creates the department.
  6. It sends response back to Employee service.
  7. Employee service sends the final response to User.

In a cloud environment, there can be multiple POD instances running, each logging information in their own file (local/shared mountpoint). This log data is eventually shipped to some "Log Aggregator" (e.g., Splunk, Elasticsearch, etc) where they are indexed and stored. Later when developer wants to view the data, they can logging onto the dashboard and see the log. Today's Log Aggregator framework provides sophisticated searching capability to fetch the relevant log statements based on the input provided.

Once log data is shipped to a common store, they are all mixed up. Now if you want to trace a complete end-to-end flow you need some sort of correlation id that is present in all the log statements generated by the two services while serving a single request. What that means is, the client facing service must generate a unique id (call it traceId), dump it as part of the log statement, and pass it on to subsequent service(s) on the request chain. The next service(s) would use the same traceId and dump it in the log. Now if you search for a specific traceId on the Aggregator Dashboard, it should fetch you all the relevant logs that has this traceId present. And your request tracing is complete.

Why Opentracing-LiTe

Simple and easy to use

The core library of Opentracing lite is independent of any "logging framework". Even if you are not using any logging framework in your project, you can still use it. And if you have any standard logging framework for your project (as that is expected for any production ready app), the integration is seemless.

Typically developer uses different logging framework in their project. Some of the common frameworks are:

  1. log4j2
  2. logback
  3. commons-logging
  4. java util logging

Out of these, log4j2 and logback are stable framework, as they follow the standard/specification set by SLF4J.

Opentracing-Lite provides upfront support for all logging frameworks that are currently compliant with SLF4J. For other non-compliant logging framework, one has to write an adapter, and that, too, is very simple.

Rich set of metrics

It exposes a set of MBeans to monitor the span creation, scope activation/deactivation, etc. User can monitor the behavior of the framework via any management console. There are other metrics currently being built that will give developer an idea about the trend. Note that, all these metrics are transient, that means, once you restart your app, you chance to loose these metrics. I may consider persisting these metrics, but currently that is out of scope. See Appendix A for list of metrics.

Schema less

Opentracing-LiTE does not have a schema on it's own. It does not enforce it either. It relies on the format defined by the developer. There are few reasons for that. Main reason being, it always tries to be developer-friendly. A developer who used to see the logs in the plain text format, should not be forced to switch to a different format, like xml of json. As it may hinder the visibility.

Opentracing-LiTE, on the other side give developers the freedom to choose the schema and/or format thet want to define, and start logging in prescribed format.

Easy segregation of log and span

It enables easy segregation of log and span data, by redirecting them to different log files. The regular log data, which is mainly in plain text format, can be sent to splunk which is the most efficient log aggregator tool as of today. The span data, which gets generated in json format, can be sent to other storage (e.g., elasticsearch) where they can be indexed and queried later to view span graph. That is another reason as to why opentracing-lite is schema less today.

Libraries for popular framework

It provides the ready-made libraries for popular framework.

Modules

Below are the core modules:

otl-api

This contains the standard api library and specification for various providers. You will most of the time be using this module for tracing.

otl-util

This library contains some utility classes used by the above modules , plus other modules as well.

otl-metrics

It contains the mbeans for capturing various metrics and exposing them via standard jmx console. It's not mandatory to use this module. Should a developer is interested in capturing the metricss, include this module.

otl-agent

Opentracing-LiTE leverages the opentracing-util's GlobalTracer to store the platform provided tracer. The platform tracer has to be initialized first before it can be used. Now, an application can have multiple entry points, e.g., a web application might be initialized either through a servlet or through a context listener. A spring boot application has a different entry point for startup, a typical messaging system can have another entry point. It is not possible to provide a hook for every single app. Hence it is decided to instantiate and load the platfor tracer via standard java agent. The otl-agent takes care of initializing your tracer, registering various MBeans with underlying platform MBean server, etc.

otl-slf4j

If your application is using some SLF4J compliant logging framework, you need this library. Note that no other configuration change is required. You can continue to use your existing appender and message format.

How to use

Let's understand this with an example:

Imagine you have a standalone java application, that uses SLF4J compliant logger (either log4j2 or logback). Here are the classes:

  1. The pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    
    <groupId>io.opns./groupId>
    <artifactId>test_app</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>
    
    <dependencies>
        <dependency>
            <groupId>io.opns.otl</groupId>
            <artifactId>otl-api</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>io.opns.otl</groupId>
            <artifactId>otl-slf4j</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
        
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.2.3</version>
        </dependency>
    </dependencies>
</project>

  1. logback.xml
<?xml version="1.0"?>
<configuration debug="true">

    <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{yyyy-MM-dd_HH:mm:ss.SSS} [%thread] [%X{trc}] [%X{spn}] [%X{pspn}] [%X{bgi}] %-5relative %-5level %logger{36} - %msg %n</pattern>
        </encoder>
    </appender>
    <appender name="FILE" class="ch.qos.logback.core.FileAppender">
        <file>span.log</file>
        <append>true</append>
        <encoder>
            <pattern>%msg %n</pattern>
        </encoder>
    </appender>
    
    <logger name="otl.span.log" level="INFO" additivity="false">
        <appender-ref ref="FILE" />
    </logger>

    <root level="INFO">
        <appender-ref ref="CONSOLE"/>
    </root>
</configuration>

Note that the special logger "otl.span.log", which uses FILE appender to write the messages to span.log. At the end, this file will have only the span data. And regular log data can be sent to other file (here redirected to CONSOLE).

Apart from the regular pattern, 4 new attributes were added:

  • trc : Will print the traceId, if present
  • spn : Will print the span, if present
  • pspn : Will print the parent spanId, if present
  • bgi : Will print the baggage items, if present
  1. Employee.java

private class Employee {
    private Integer id;
    private String name;
    private String location;
    
    private Department dept;
    
    public Employee(Integer id, String name, String location) {
        this.id = id;
        this.name = name;
        this.location = location;
    }
    
    public void setDepartment(Department dept) {
        this.dept = dept;
    }
    
    public Department getDepartment() {
        return dept;
    }
    
    public Integer getId() {
        return id;
    }
}

  1. Department.java

private class Department {
    private Integer id;
    private String name;
    
    public Department(Integer id, String name) {
        this.id = id;
        this.name = name;
    }
    
    public Integer getId() {
        return id;
    }
}

  1. Registration.java

import io.opentracing.Scope;
import io.opentracing.Span;
import io.opentracing.Tracer;
import io.opentracing.util.GlobalTracer;
import java.util.Map;
import java.util.HashMap;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Registration {
    
    private static final Logger logger = LoggerFactory.getLogger(Registration.class);
    
    private static final Map<Integer, Employee> employees = new HashMap<>();
    private static final Map<Integer, Department> departments = new HashMap<>();
    
    private static final Tracer tracer = GlobalTracer.get();

    public static void main(String[] args) {
        logger.info("Starting employee registration");
        
        Span span = tracer.buildSpan("main").start();
        try (Scope scope = tracer.activateSpan(span)) {
            Employee emp = new Employee(10, "Socrates", "Greece");
            Department dept = new Department(201, "Philosophy");
            emp.setDepartment(dept);

            createEmployee(emp);
            logger.info("Employee registration is complete");
        }
        finally {
            span.finish();
        }
    }
    
    private static void createEmployee(Employee emp) {
        Span span = tracer.buildSpan("createEmployee").start();
        
        try (Scope scope = tracer.activateSpan(span)) {
            employees.put(emp.getId(), emp);
            logger.info("Employee created successfully");

            Department dept = emp.getDepartment();
            if (dept != null) {
                createDepartment(dept);
            }
        }
        finally {
            span.finish();
        }
    }
    
    private static void createDepartment(Department dept) {
        Span span = tracer.buildSpan("createDepartment").start();
        
        try (Scope scope = tracer.activateSpan(span)) {
            departments.put(dept.getId(), dept);
            logger.info("Department created successfully");
        }
        finally {
            span.finish();
        }
    }
}

What we are trying here is:

  1. When the main() method is invoked, system will create a new trace T1 and span S1.
  2. It then calls createEmployee() method to create the employee. This method will create a new span S2 to identify it's work. At this point, S1 becomes a parent of S2.
  3. Post that, it will call createDepartment() API to create the associated department. This method, too, will create a new span S3 to represent its work. at this stage S2 becomes a parent of S3.
  4. Once the work is complete, the spans and their associated scopes will be closed in reverse order.

S3 closes first, followed by S2 and S1. Note that they all use the same traceId T1. And that's how we can correlate the logs generated by individual methods. Now imagine there are two services instead of two methods. With the help of traceId, we can link them.

Let's see how to run them.

Compile:

mvn clean install

Get the classpath, we will use it in the "Run" command.

mvn dependency:build-classpath

Run:

java -javaagent:/path/to/otl-agent-1.0-SNAPSHOT.jar -classpath <classpath_entries> io.opns.test_app.Registration

CONSOLE Log:

2020-04-03_02:00:26.232 [main] [] [] [] [] 122   INFO  io.opns.test_app.Registration - Starting employee registration 
2020-04-03_02:00:26.284 [main] [4EtY9Q32NnFPKnwpuPbuKBmPFoVA97gC] [EPAJ2GPdfTccdK92] [0lSiHNkZMIHygAvm] [] 174   INFO  io.opns.test_app.Registration - Employee created successfully 
2020-04-03_02:00:26.284 [main] [4EtY9Q32NnFPKnwpuPbuKBmPFoVA97gC] [zD5KhTynBuBW8dEP] [EPAJ2GPdfTccdK92] [] 174   INFO  io.opns.test_app.Registration - Department created successfully 
2020-04-03_02:00:26.286 [main] [4EtY9Q32NnFPKnwpuPbuKBmPFoVA97gC] [0lSiHNkZMIHygAvm] [] [] 176   INFO  io.opns.test_app.Registration - Employee registration is complete

Note that the traceId, spanId and parent spanId details are logged. Notice the relationship.

FILE (span.log)

{"traceId":"VJrpXmWr59zFx5lKMXOrt2rN3xFfTQO4","spanId":"dDgzj1o98xBpP6nt","references":[ {"type":"child_of","spanId":"KTXsB8YG41MOGrMO"}],"operation":"createEmployee","start":726296232442,"end":726296232635}
{"traceId":"VJrpXmWr59zFx5lKMXOrt2rN3xFfTQO4","spanId":"KTXsB8YG41MOGrMO","references":[ {"type":"child_of","spanId":"lmKQNyfPe7chHojZ"}],"operation":"createDepartment","start":726296231932,"end":726296234088}
{"traceId":"VJrpXmWr59zFx5lKMXOrt2rN3xFfTQO4","spanId":"lmKQNyfPe7chHojZ","operation":"main","start":726296185318,"end":726296234222}

So the span log is segregated. Note that it is generated in json format. Wheareas log data continues to generate in existing (plain text) format.

Appendix A

Metric

Span

  1. Total Span
  2. Average Span Creation Time
  3. Max Span Creation Time
  4. Average Span duration
  5. Max Span duration.
  6. Total finished Span

Scope

  1. Total number of Scope activated
  2. Total number of Scope closed

Tracer

  1. Scope Manager
  2. Propagation Registry
  3. Visitor

Other span category metrics are currently being built.

MBean Snapshots

alt text

alt text

Versions

Version
0.2.16-alpha