Quantcast
Viewing latest article 14
Browse Latest Browse All 23046

Everyday Superpowers: What is event sourcing and why you should care

This is the second entry in a five-part series about event sourcing:

  1. Why I Finally Embraced Event Sourcing—And Why You Should Too
  2. What is event sourcing and why you should care
  3. Preventing painful coupling
  4. Event-driven microservice in a monolith
  5. Get started with event sourcing today

In my last blog post, I introduced the concept of event sourcing and some of its benefits. In this post, I’ll discuss the pattern in more depth.

What is event sourcing?

Event sourcing is an architectural pattern for software development that has two components:

  • To change the state of the application, you save the data associated with that change in an append-only log.
  • The current state of an item is derived by querying the log for related events and building the state from those events.

It emerged from the domain-driven design community over twenty years ago, and like many things in the development world, its definition can vary drastically from the original.

However, these two components are the core of event sourcing. I’ve seen people include eventual consistency, CQRS, and event streaming in their definitions of event sourcing, but these are optional additions to the pattern.

It’s best to see an example. Let’s compare a shopping cart application built in a traditional way and an event-sourced way, you'd see a stark difference in the following scenario:

A user:

  • adds a teeny weenie beanie to their shopping cart
  • adds a warm sweater
  • adds a scarf
  • adds one of those hats that has ear flaps
  • removes the teeny weenie beanie
  • checks out

A traditional application would store the current state:

html
cart_idproduct_idspurchased_at
12341,2,52025-03-04T15:06:24
alignment
normal

Where the event-sourced application would have saved all the changes:

html
event_idcart_idevent_typedatatimestamp
231234CartCreated{}2025-01-12T11:01:31
241234ItemAdded{“product_id”: 3}2025-01-12T11:01:31
251234ItemAdded{“product_id”: 2}2025-01-12T11:02:48
261234ItemAdded{“product_id”: 1}2025-01-12T11:04:15
271234ItemAdded{“product_id”: 5}2025-01-12T11:05:42
281234ItemRemoved{“product_id”: 3}2025-01-12T11:09:59
291234CheckedOut{}2025-01-12T11:10:20
alignment
normal

From this example, it’s clear that event sourcing uses more storage space than a similar traditional app. This extra storage isn't just a tradeoff—it unlocks powerful capabilities. Some of my favorite include:

Having fast web views

Initially the thing that made me interested in event sourcing was to have fast web pages. I’ve worked on several projects with expensive database queries that hampered performance and user experience.

In one project, we introduced a new feature that stored product-specific metadata for items. For example, a line of printers had specific dimensions, was available in three colors, and had scanning capabilities. However, for a line of shredders, we would save its shredding rate, what technique it uses to shred, and its capacity.

This feature had a design flaw. The system needed to query the database multiple times to build the query that retrieved the item's information. This caused our service to slow whenever a client hit one of our more common endpoints.

Most applications use the same mechanism to save and read data from the database, often optimizing for data integrity rather than read performance. This can lead to slow queries, especially when retrieving complex data.

For example, the database tables supporting the feature I mentioned above looked something somewhat like this:

A look-up table to define the product based on the product type, manufacturer, and model:

html
idproduct_type_idmanufacturer_idmodel_id
1231238
21417125
alignment
normal

A table to define the feature names and what kind of data they are:

html
idnametype
1available colorslist_string
2has scannerboolean
3dimensionsstring
4capacitystring
alignment
normal

A table that held the values:

html
idproduct_idfeature_idvalue
111["dark grey", "lighter grey", "gray grey"]
212false
323"roughly 2 feet in diameter..."
424"64 cubic feet"
alignment
normal

The final query to retrieve the features for a line of printers would look something like this:

SELECT f.name, pf.value, f.type
FROM product_features pf
JOIN features f ON pf.feature_id = f.id
WHERE pf.product_id = (
SELECT id FROM products
WHERE product_type_id = 23 AND manufacturer_id = 12 AND model_id = 38

That would return:

html
namevaluetype
available colors["dark grey", "lighter grey", "gray grey"]list_string
has scannerfalseboolean
alignment
normal

Instead, you can use the CQRS (Command Query Responsibility Segregation) pattern. Instead of using the same data model for both reads and writes, CQRS separates them, allowing the system to maintain highly efficient, read-optimized views.

A read-optimized view of features could look like this:

html
product_type_idmanufacturer_idmodel_idfeatures
231238 [{"name":"available colors", "value":["dark grey", "lighter grey", "office grey", "gray grey"], "type":"list_string"}, {"name":"has scanner", "value": false, "type": "boolean"}, ...]
1417125 [{"name":"dimensions", "value":"roughly 2 feet in diameter at the mouth and 4 feet deep", "type":"string"}, {"name":"capacity", "value": "64 cubic feet", "type": "string"}, ...]
alignment
normal

And querying it would look like:

SELECT features FROM features_table
WHERE product_type_id = 23 AND manufacturer_id = 12 AND model_id = 38;

What a difference!

I recommend looking into CQRS even without using event sourcing.

Event sourcing pairs well with CQRS

Event sourcing aligns well with CQRS because once events have been written to the append-only log, the system can also publish the event to internal functions that can do something with that data, like updating read-optimized views. This allows applications to maintain high performance and prevent complex queries.

An event-sourced solution that used a command-query responsibility segregation (CQRS) pattern would have allowed us to maintain a read-optimized table instead of constructing expensive queries dynamically.

While this specific case was painful, your project doesn’t have to be that bad to see a benefit. In today’s world of spinners and waiting for data to appear in blank interfaces, it’s refreshing to have a web app that loads quickly.

As a developer, it’s also nice not to chase data down across multiple tables. As someone once said, “I always like to get the data for a view by running `SELECT * FROM ui_specific_table WHERE id = 123;`.

Not just web views

The same principles that make web views fast can also help with large reports or exports.

Another project I know about suffered performance problems whenever an admin user would request to download a report. Querying the data was expensive, and generating the file took a lot of memory. The whole process slowed the application down for every user and timed out occasionally, causing the process to start over.

The team changed their approach to storing files on the server and incrementally updating them as events happened. This turned what was an expensive operation that slowed the system for 20 or more seconds per request into a simple static file transfer that took milliseconds without straining the server at all.

Schema changes without fear

Another thing I love about the event sourcing pattern is changing database schemas and experimenting with new features.

In my last blog post, I mentioned adding a duration column to a table that shows the status of files being processed by an application I'm working on. Since I wrote that, we've determined that we would like even more information. I will add the duration for each step in the process to that view.

This change is relatively simple from a database perspective. I will add new columns for each step's duration. But if I needed to change the table's schema significantly, I would still confidently approach this task.

I would look at the UI, see how the data would be formatted, and consider how we could store the data in that format. That would become the schema for a new table for this feature.

Then, I would write code that would query the store for each kind of event that changes the data. For example, I would have a function that creates a row whenever a `FileAdded` event is saved and another that updates the row's progress percent and duration information when a step finishes.

Then, I would create a script that reads every event in the event log and calls any function associated with that event.

In Python, that script could look like this:

def populate_table(events):
    for event in events:
        if event.kind == 'FileAdded':
            on_file_added(event)
        elif event.kind == 'FileMetadataProcessed':
            on_metadata_added(event)
    ...

This would populate the table in seconds (without causing other side effects).

Then, I would have the web page load the data from that table to check my work. If something isn't right, I'd adjust and replay the events again.

I love the flexibility this pattern gives me. I can create and remove database tables as needed, confident that the system isn't losing data.

Up next

Once I started working on an event-sourced project, I found a new feature that became my favorite, to the point that it completely changed how I think about writing applications. In the next post, I'll explore how coupling is one of the biggest challenges in software and how the same properties that make event sourcing flexible also make it a powerful tool for reducing coupling.


Read more...

Viewing latest article 14
Browse Latest Browse All 23046

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>