Messari has a small engineering team. As such, we're always looking for ways to deliver features faster, better, and cheaper. Our governance product offering is quite young and badly needed a customer-facing notification system. This system would send emails to users with details of all off-chain and on-chain proposal events (e.g. created, voting started, voting ending soon, voting ended, proposal executed). Some of these emails would be immediately sent, and others would be sent in a digest (e.g. daily, weekly).
At the start of the project, Messari had two notification systems in use— one internal and one external. They were custom-built to very specific product requirements with minimal feature overlap. Moreover, they had vastly different philosophies and languages: one was event-driven written in Python and the other was cron-driven written in Java and Golang. Lastly, they had different taxonomies, failure handling, unit test coverage, observability, and metrics. Using either quickly became unrealistic given the new governance product requirements and the engineering goals of resiliency, observability, and scalability. We would build a new universal customer notification system to be eventually used by all our services.
Resilient, Observable, and Scalable
A notification system should be resilient because important notifications cannot be lost or unduly delayed. It should also gracefully recover from system failures. To that end, the system is eventually consistent with an at-least-once design.
The system should provide comprehensive observability to gain insights into its internal operations and debug any issues that may arise. There should be monitoring to get a sense of what is normal, when the system is experiencing problems, and how much capacity is needed in the future. Moreover, troubleshooting issues should be relatively simple and straight-forwards.
Lastly, the system should be scalable. It should scale to natural user growth, usage spikes, and additional use cases.
The Approach
The first decision to make was if the new notification system should be choreographed or orchestrated.
By default, many pick choreography because it is a well-known pattern that can be implemented with very well-known tools. A typical architecture would be a set of workers consuming and producing events via persistent queues. Additionally, it is a relatively simple system to stand up, and a functional prototype can be written quickly.
Orchestration, on the other hand, is a much less common solution because it is not a well-known pattern and cannot easily be implemented from first principles. Worse yet, the OSS orchestration systems are so complex that it is best to look for a PaaS provider which adds an adoption hurdle.
We choose orchestration using Temporal via their PaaS offering.
Why orchestration?
Messari has a small engineering team. As such, we value visibility and maintainability. As product requirements become more complex, it is very difficult for one person to know the entire system. It is time consuming to debug problems due to the high complexity of microservice interactions. Adding a new feature may require deep knowledge of the entire system. Overall, it isn't a good devX. As one teammate jokes, "I have PTSD from my last message-based project". In short, orchestration promises a simpler system for a small team to more easily own.
Why Temporal?
There are a few orchestration platforms (e.g Netflix Conductor) to consider. Temporal was chosen for a few reasons. Firstly, it's easy to use and easy to understand the key concepts. They have great documentation with plenty of code samples. The philosophy is very closely aligned with Golang— Messari is a Golang shop. Additionally, there's a lively support forum and community Slack. Secondly, they have built it in a way such that visibility is a first-class citizen. They have out-of-the-box logging, metrics, and tracing. All workflow steps are saved in event histories with their inputs and outputs, and one can replay event histories for local debugging. Lastly, Temporal has a code-first approach. By default, there are no DSLs, YAMLs, JSONs, etc. All workflows and accompanying activities can be completely defined in code.
Execution
One of the unique features of the governance notification product is the ability to send digest emails on a weekly cadence given a user's desired day. This feature was straight-forwards to implement using Temporal because Temporal provides powerful and simple building blocks.
At the crux of the implementation is a long-lived digest workflow. It waits until a certain time. At which time, it will execute a few activities:
To complicate matters, the workflow can receive two types of data updates:
Changing the digest day requires the workflow to change when the activities should run next.
To further increase visibility of running workflows, there are two queries that can retrieve information from a workflow:
The above image is the Temporal UI for a weekly digest workflow. It describes the following chronological events:
Overall, we're super happy with Temporal and Temporal Cloud. Temporal does the heavy-lifting so that we can focus on the business logic and delivering resilient products at speed.
Future
With our new notification system up and running, we have several follow-up projects in mind:
If that sounds fun, why not join us on our next project?
If you’re a software engineer interested in helping us contextualize and categorize the world’s crypto data, we’re hiring. Check out our open engineering positions to find out more.