She Speaks From Her Desk

Open hands take invitations into the closed spaces and release them too — take on some more wondering I’d walk every one of those kilometres/do the swim or catch something wooden and solid stained…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Monitoring Akka applications with Mesmer and OpenTelemetry

The terms monitoring and observability have been bandied around a lot lately, often in confusing interchangeable contexts. They are closely connected but in essence, they approach problems from different directions.

But why do we need monitoring and observability at all? The distributed systems we build are becoming increasingly complex day by day. And even without the distributed aspect — we’re using higher and higher levels of abstractions often without realizing how they interact with lower levels — as far down as the OS and hardware. At the same time these systems take on more and more responsibilities and we sometimes have to rely on them completely. In such circumstances, failures can be very costly or even disastrous and so it’s a priority to be able to quickly pinpoint any problem as it arises. Which microservice failed? What was its state at the moment of failure? Was it a memory problem or an application error? Where exactly was it located? We need all of these answers to be able to mitigate the problem and make sure it doesn’t happen again. Otherwise, all we can do is guess or give the famous answer: “have you tried turning it off and on again”.

Some other valuable use cases for monitoring include:

Most of the time telemetry data can be split into the following categories:

Traces allow us to track and measure the progress path of a single action. As the action (or request) flows through the system, the individual units of work (called spans) are composed into a tree-like structure. A visualization of the traces allows us to quickly identify where the action mostly happened and what the order was in which the particular components were invoked.

The sequence could look something like this:

As you can see we can identify how long the request took in total, that the database call was the longest part if the cache lookup was a hit or a miss and that particular validations were run in parallel.

Although this is very useful in general, it’s not practical to store and analyze every single request trace. In practice, only some parts of the traces are stored. This process also allows us to monitor outliers — requests that took a toll on the application. It’s also a great place to start an analysis when something goes wrong although there are better ways to measure overall application performance.

Metrics is a type of telemetry data that tracks raw measurements. This could be a summary such as CPU usage, current Kafka lag, percentiles of http responses, or total amounts such as the amount of bytes received or the total amount of response time. The latter opens the way for rate calculation on the observability of backend services. Metrics are much more suitable for monitoring the overall performance of an application and are usually the first indicator of a performance problem. A word of caution, however — it’s important to know what the specific metrics stand for and the measurement units — different aggregations might be performed on different types of metric, and not adhering to the rules here might result in values that are nonsensical.

Logs represent independent events throughout the application lifecycle. They are the most granular type of telemetry data, however, it’s very hard to tell anything about the application performance by looking solely at logs. There are tools that allow you to create dashboards from log files that bring a lot of clarity but it’s very brittle and any change in the log format might break it completely

OpenTelemetry has its roots in two competing projects — OpenTracing and OpenCensus. Both were aimed at creating a vendor-neutral set of tools and APIs, and both made their own architectural decisions and tradeoffs.

Eventually, the two projects decided to merge and OpenTelemetry came to life ( https://medium.com/opentracing/a-roadmap-to-convergence-b074e5815289). At the moment of writing, the project is a Sandbox project of the Cloud Native Computing Foundation and is supported by all major observability vendors and communities.

OpenTelemetry is based on a simple flow taken from other solutions but makes it more powerful by making it vendor agnostic.

In the case of more complicated infrastructures with multiple monitored components, microservices, for example, the recommended method is to have a collector per component (as a sidecar) and an additional aggregating/gateway collector.

As you can see, the OpenTelemetry infrastructure is built around collectors that are responsible for receiving telemetry data, carrying out some processing on the data (or not), and exporting the data to a backend.

OpenTelemetry collectors support many vendors both on the receiving and exporting sides. They also support the gathering of data from multiple inputs and exporting to multiple outputs at the same time. This is great when introducing OpenTelemetry to an existing telemetry infrastructure and migrating between vendors.

OpenTelemetry API defines a set of interfaces that can be used to instrument a monitored application as well as a noop version of the implementation. The SDK on the other hand provides a concrete implementation of the API with additional means to configure and run the OpenTelemetry machinery.

From the application developer perspective, this means that you will be using the API to create counters, spans, etc. as they represent cross-cutting concerns of monitoring. And then you will use the SDK to instantiate an exporter and other runtime monitoring components, possibly somewhere at an entry point to your application.

For library designers (as in the case of Mesmer) the separation of API and SDK means that the library should depend solely on the API.

It consists of two parts: a Java agent that is required for Mesmer to do the bytecode instrumentation and an Akka Extension that processes and exports the metrics.

As Mesmer is an auto-instrumentation library, adding it to an existing project doesn’t require any changes in the existing code, besides setting up the OpenTelemetry SDK Exporter. And even that might not be necessary if you’re already using OpenTelemetry in your project.

The architecture diagram below best describes the place of Mesmer in the OpenTelemetry pipeline.

Akka is an advanced and complex piece of technology. It builds on a concept of actors and goes in many different directions: HTTP servers, remoting, clustering, streaming, persistence, and more. Thanks to this we can build complete solutions that can handle almost every scenario. But even our solutions have limits and might have problems. It’s very important to know where these limits are and where the problems are located. This is where monitoring comes in general and Mesmer in particular.

Here are some examples of metrics that are accessible out of the box by just including Mesmer in your deployment:

Akka Actors

Akka Streams

Akka HTTP

Akka Persistence

Akka Cluster

As already mentioned, Mesmer consists of two major components — an Actor System Extension and a Java Agent. Most of the interesting metrics generation requires the use of Java Agent technology that carries out a bytecode modification when the Java classes are loaded. There are several ways to attach the Java agent to a process but the recommended way is to do it from the command line with `-javaagent` flag. Using any runtime option might result in some features not being available because if any class was present that was already loaded, its transformation is possible but will be much less powerful.

A Mesmer Extension is an ordinary Akka Extension and can be enabled both from configuration or code. As a Mesmer agent is stateless, the Akka Extension holds information to calculate the metrics from modified classes and exports it to the OpenTelemetry collector for further processing.

Additional resources:

Add a comment

Related posts:

Key capabilities every modern payment platform should support

It was until very recently, that handling payments was exclusively handled by financial institutions such as banks. Today, that no longer remains the case. Payment management is one of the areas…

Idena announces the winners for its challenges at GitxChange Hackathon

Idena partnered with Gitcoin and RadicalxChange for a virtual hackathon that ran from May 27th to June 16th, 2020. There were two Idena challenges at the GitxChange Hackathon: Build Idena Ethereum…

Clean Android Emulator and iOS Simulator Screen Capture on Mac

Have ever wanted to a screenshot of your App for documentation or advertisement purpose? How did you do it? Now you just need to click which window you like to capture. It will capture the entire…