Envisioning a new tracing system

February 10, 2018

I’ve begun designing a new kind of tracing system for distributed systems. With tracing system I mean something like Zipkin or Prometheus: they trace the flow of information in distributed systems.

I think the tool should offer a complementary view on the system, it’s not going to offer detailed logs or individual traces.

Instead, it’s something what I call an overhead monitor, an observatory from above. It doesn’t track individual elements flowing in the system, to put it metaphorically, it lives too high up in order to see individual ants in a colony. What it can see is the flow and not-flow of ants, and measure their speed.

I’m interested in the big picture. Is information flowing the way it should? How fast is it traveling? Is my context propagated correctly? How much information is there flowing per second, minute, or hour?

The idea is to monitor rate. It would be a bit like a traffic monitor. What you could use it for is to instantly read the amount of information flow in the system. The flow would be represented as a directed graph.

Figure 1. Something like this.

Earlier last year I sketched a design document, so I won’t go into the details too much, if you’re interested in those, go read it.

So far, I have figured the following design characteristics:

All you need is to provide the names of nodes in the network, and the system will figure out the directions of the information flows.
It is optionally possible to design some quality attributes but these require manual configuration. For example, you’d have to state that "95% of requests from A to B should happen within 200 ms".
The system is God; it sees all. That is to say, all the events in the network should be fed into the system.
Because God systems are terrible and prone to failure, the aim is to support distribution, such that you can have multiple observers to which you can partition the events. The big picture will be assembled by combining the observer data.
Configuration should be easy, with a humane syntax like YAML, and the possibility to override this.

The system doesn’t have to worry about spans, that’s for other systems. All you need to do is propagate context.

I have thought about distributed tracing previously. I’ve found that many of those questions are still unanswered. Most tracing systems like Zipkin and Prometheus do very well with individual elements and span lengths, but they naturally require you to configure the spans in the code.

My aim with the observatory-like approach is to make it extremely simple to track flow rates. All you need is the following:

Propagate context in the system, like a tracing token coupled with a timestamp
Aggregate logs to the observatory

The second problem is the harder one: if you centralize the system, every node in the system will just log events there, and the system is prone to failure. My idea is to overcome this limitation by decentralizing observation, and then using something like the Gossip protocol to combine the results.

It doesn’t need to react immediately (probably within seconds), so the slowness of Gossip is not a threat to the design. Observation data is also initially ephemeral, I’d most likely prefer using event sourcing to build the observation model.

I haven’t chosen the technology yet, but most likely it will be written in Scala and then event sourcing will most likely based on Kafka.

Now only to find the time necessary to build this.

Previous: Useless interfaces Next: The Joy of AsciiDoc