Blog - Mar 23, 2023

Why Backing Up Your Business Events is Crucial

Name: Cymo
Price range: $

Most organisations know that creating a backup of their data is essential for their databases, but don’t consider this need when it comes to an event hub like Kafka. The data in transfer that is commonly associated with these processing hubs generally isn’t thought of as having to be kept safe. However, there are several reasons why backing up your business events should be at the top of your list when it comes to safeguarding your company's data. In this blog, we'll explore why and how you can back up your events in order to keep your data safe and secure.

Why people don’t consider backing up business events

We typically use queueing systems to process data sets asynchronously, often via batch jobs. If something goes wrong, we can simply start over the entire process without any loss of data. We can also use queues and event hubs for point-to-point integrations between different systems. The source system will produce events by reading from its own database, which the other system will process at its own pace. In case of any issue, we can always reproduce the data from its original source.

Perhaps the most important reason of all is that event hubs can be deployed over multiple zones, with all data replicated over these zones. As soon as a problem occurs in one zone, the other cluster nodes will take over, and the applications and services that use the event hub won’t even notice that there is a problem. When the node becomes available again, the system will make sure that all data will be replicated before this part of the cluster is activated.

This prevents the most likely failure, where a single zone goes out for a certain amount of time. In situations where redundancy is key, you might also consider deploying an extra cluster in either an active-passive or active-active setup. However, this will come with a substantial cost. Moreover, switching all applications to a passive cluster when your primary cluster fails could prove quite challenging, even within a microservice environment.

Which events should you consider backing up?

In modern Event-Driven Architectures, we tend to not think about point-to-point integrations anymore. The applications produce business events when they occur, without even knowing which consuming applications there are. However, this information can be useful for (loosely) coupling and reusing those applications.

Keep in mind that business events are not an exact replication of the records in the database: they express what really happened at which time. For important entities, event hubs will often store a lot more historical data than databases. This is behavioural data, which expresses what happened in an organisation. This information is priceless for business.

Storing this data in a data lake is certainly useful, but in an Event-Driven Architecture, you may want to take it one step further by making these business events always available to be processed in an ordered sequence.

The events are typically stored on the hub for a longer time, and consuming applications can use them to build up read models in memory or a persistent data store, so they can quickly read this data whenever necessary. However, not only consuming applications build up read models. Read models are also built in the event hub for streaming purposes in the form of compacted topics.

Imagine a situation where you want to stream data from two topics, and you need to add data from event A to event B when either event arrives. To do so, you will need to have the relevant data stored on the event hub in topics with appropriate keys. This allows you to:

Stream the data: reading events from multiple topics and aggregating data, comparable to join operations in a database.
Replay the data, for example when a consuming application has encountered a bug.
Phase in new microservices whenever you want. The new services will be able to read all events from the get-go and react to them as if the microservice had always existed.

Why do you need to back up events?

Human mistakes

Imagine someone updating the retention on a topic configuration to one day. Even in an Infrastructure as Code (IaC) setup, a reviewer may not flag this as a problem. These kinds of mistakes will lead to compaction and loss of data on all replicated clusters, since replication and server linking also replicates topic configurations. In this example, the data would be deleted from all clusters after just one day.

Disaster recovery

Nobody can guarantee that a cluster will always be up and running, or even that a cluster can be started again after failure. In a worst-case scenario, you will need a plan to recreate the cluster from scratch. To make matters worse, these kinds of disasters usually don’t come alone. Other databases and applications may also be impacted, and they will need to be fixed at the same time.

Using Infrastructure as Code (IaC), a new event hub cluster can often be set up in just thirty minutes. This includes all configuration, but not the actual data. Luckily, an optimised backup solution will be able to restore the data for crucial business events in a reasonable amount of time as well. In other words, setting up a new cluster in the unlikely event of a disaster can be a worthwhile alternative to the expenses of permanently replicated clusters, which also come with other downsides.

Testing environment

Many organisations are used to setting up test environments with specific data sets, often subsets of non-sensitive production data. An event hub is no different. To ensure that the hub is in line with the master data in the databases, a cluster will need to be set up and events will need to be produced on specific topics. A restore process will make it possible to restore topics to different environments or even other topic names, giving you the flexibility to set up test environments quickly.

Multiple copies and air gapping

Having multiple backups gives you a safety net in case something goes wrong with any of your data backups or storage, whether it's phishing attacks, malware, or even natural disasters.

Multiple backups can be created easily when the data is stored in files, for example on cloud file storage, and there are solutions to copy and air gap cloud file storage. This makes it possible to air gap the backups, keeping them offline and inaccessible, since your backup device cannot be remotely hacked or corrupted.

How can you back up events?

Kafka solutions are readily available, including source and sink connectors through the Confluent platform, but these solutions have several limitations.

They do not keep track of the original offset.
They cannot be restored until a certain moment. If you restore a database backup, you won’t be able to sync your topic data to the exact time of the backup.
In some cases, connectors suffer from performance issues.
Because of the amount of data, a managed connector is often expensive.
These connectors require custom configuration and development with a limited set of parameters, since they are not made for backup purposes.
Connectors do not have an easy-to-use interface.

To address these shortcomings, we have created Kannika. Kannika is a straightforward, purpose-built solution that has been optimised for high throughput and performance. It offers a user-friendly way of taking control over your backups, and makes restoring events to a specific moment in time a piece of cake.

Interested in our Event-Driven Architecture solutions and how they can help transform your business? Cymo is at the forefront of EDA innovation and has a proven track record of helping enterprises rethink the way they work. Contact us to find out more and discuss the possibilities.