Django comes with "batteries included" to make CRUD (create, read, update, delete) operations easy. It's nice that the CR part (create and read) of CRUD is so easy, but have you ever paused to think about the UD part (update and delete)?
Let's look at delete. All you need to do is this:
ReallyImportantModel.objects.get(id=32).delete()# gone from the database forever
Just one line, and your data is gone forever. It can be done accidentally. Or you can be do it deliberately, only to later realise that your old data is valuable too.
Now what about updating?
Updating is deleting in disguise.
When you update, you're deleting the old data and replacing it with something new. It's still deletion.
important=ReallyImportantModel.object.get(id=32)important.update(data={'new_data':'This is new data'})# OLD DATA GONE FOREVER
Okay, but why do we care?
Let's say we want to know the state of ReallyImportantModel
6 months ago. Oh that's right, you've deleted it, so you can't get it back.
Well, that's not exactly true -- you can recreate your data from backups (if you don't backup your database, stop reading right now and fix that immediately). But that's clumsy.
So by only storing the current state of the object, you lose all the contextual information on how the object arrived at this current state. Not only that, you make it difficult to make projections about the future.
Event sourcing 1 can help with that.
Event sourcing
The basic concept of event sourcing is this:
- Instead of just storing the current state, we also store the events that lead up to the current state
- Events are replayable. We can travel back in time to any point by replaying every event up to that point in time
- That also means we can recover the current state just by replaying every event, even if the current state was accidentally deleted
- Events are append-only.
To gain an intuition, let's look at an event sourcing system you're familiar with: your bank account.
Your "state" is your account balance, while your "events" are your transactions (deposit, withdrawal, etc.).
Can you imagine a bank account that only shows you the current balance?
That is clearly unacceptable ("Why do I only have $50? Where did my money go? If only I could see the the history."). So we always store the history of transfers as the source of truth.
Implementing event sourcing in Django
Let's look at a few ways to do this in Django.
Ad-hoc models
If you have a one or two important models, you probably don't need a generalizable event sourcing solution that applies to all models.
You could do it on an ad-hoc basis like this, if you can have a relationship that makes sense:
# in an app called 'account'fromdjango.dbimportmodelsfromdjango.confimportsettingsclassAccount(models.Model):"""Bank account"""balance=models.DecimalField(max_digits=19,decimal_places=6)owner=models.ForeignKey(settings.AUTH_USER_MODEL,related_name='account')classTransfer(models.Model):""" Represents a transfer in or out of an account. A positive amount indicates that it is a transfer into the account, whereas a negative amount indicates that it is a transfer out of the account."""account=models.ForeignKey('account.Account',on_delete=models.PROTECT,related_name='transfers')amount=models.DecimalField(max_digits=19,decimal_places=6)date=models.DateTimeField()
In this case your "state" is in your Account
model, whereas your Transfer
model contains the "events".
Having Transfer
objects makes it trivial to recreate any account.
Using an Event Store
You could also use a single Event
model to store every possible event in any model. A nice way to do this is to encode the changes in a JSON field.
This example uses Postgres:
fromdjango.contrib.contenttypes.fieldsimportGenericForeignKeyfromdjango.contrib.contenttypes.modelsimportContentTypefromdjango.contrib.postgres.fieldsimportJSONFieldfromdjango.dbimportmodelsclassEvent(models.Model):"""Event table that stores all model changes"""content_type=models.ForeignKey(ContentType,on_delete=models.PROTECT)object_id=models.PositiveIntegerField()time_created=models.DateTimeField()content_object=GenericForeignKey('content_type','object_id')body=JSONField()
You can then add methods to any model that mutates the state:
classAccount(models.Model):balance=models.DecimalField(max_digits=19,decimal_places=6,default=0owner=models.ForeignKey(settings.AUTH_USER_MODEL,related_name='account')defmake_deposit(self,amount):"""Deposit money into account"""Event.objects.create(content_object=self,time_created=timezone.now(),body=json.dumps({'type':'made_deposit','amount':amount,}))self.balance+=amountself.save()defmake_withdrawal(self,amount):"""Withdraw money from account"""Event.objects.create(content_object=self,time_created=timezone.now(),body=json.dumps({'type':'made_withdrawal','amount':-amount,# withdraw = negative amount}))self.balance-=amountself.save()@classmethoddefcreate_account(cls,owner):"""Create an account"""account=cls.objects.create(owner=owner,balance=0)Event.objects.create(content_object=account,time_created=timezone.now(),body=json.dumps({'type':'created_account','id':account.id,'owner_id':owner.id}))returnaccount
So now you can do this:
account=Account.create_account(owner=User.objects.first())account.make_deposit(decimal.Decimal(50.0))account.make_deposit(decimal.Decimal(125.0))account.make_withdrawal(decimal.Decimal(75.0))events=Event.objects.filter(content_type=ContentType.objects.get_for_model(account),object_id=account.id)foreventinevents:print(event.body)
Which should give you this:
{"type":"created_account","id":2,"owner_id":1}{"type":"made_deposit","amount":50.0}{"type":"made_deposit","amount":125.0}{"type":"made_withdrawal","amount":-75}
Again, this makes it trivial to write any utility methods to recreate any instance of Account
, even if you accidentally dropped the whole accounts table.
Snapshotting
There will come a time when you have too many events to efficiently replay the entire history. In this case, a good optimisation step would be snapshots taken at various points in history. For example, in our accounting example one could save snapshots of the account in an AccountBalance
model, which is a snapshot of the account's state at a point in time.
You could do this via a scheduled task. Celery 2 is a good option.
Summary
Use event sourcing to maintain an append-only list of events for your critical data. This effectively allows you to travel in time to any point in history to see the state of your data at that time.
UPDATE: If you want to see an example repo, feel free to take a look here: https://github.com/yoongkang/event_sourcing_example
Martin Fowler wrote a detailed description of event sourcing in his website here: http://martinfowler.com/eaaDev/EventSourcing.html ↩
Celery project. http://www.celeryproject.org/ ↩