A Closer Look at the Django ORM and Many-To-Many Relationships
In the last post I worked some on the data model for the KidsTasks app and discovered that a many-to-many relationship would not allow multiple copies of the same task to exist in a given schedule. Further reading showed me, without much explanation, that using a “through” parameter on the relationship definition fixed that. In this post I want to take a closer look at what’s going on in that django model magic.
Django Shell
As part of my research for this topic, I was lead to a quick description of the Django shell which is great for testing out ideas and playing with the models you’re developing. I found a good description here. (which also gives a look at filters and QuerySets).
Additionally, I’ll note for anyone wanting to play along at home, that the following sequence of commands was quite helpful to have handy when testing different models.
$ rm tasks/migrations db.sqlite3 -rf $ ./manage.py makemigrations tasks $ ./manage.py migrate $ ./manage.py shell Python 3.4.3 (default, Oct 14 2015, 20:33:09) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole)
Many To Many without an Intermediate Class
I’ll start by examining what happened with my original model design where a DayOfWeekSchedule had a ManyToMany relationship with Task.
Simple Solution Code
The simplified model I’ll use here looks like this.
class Task(models.Model): name = models.CharField(max_length=256) required = models.BooleanField() def __str__(self): return self.name class DayOfWeekSchedule(models.Model): tasks = models.ManyToManyField(Task) name = models.CharField(max_length=20) def __str__(self): return self.name
Note that the ManyToMany field directly accesses the Task class. (Also note that I retained the __str__ methods to make the shell output more meaningful.)
Experiment
In the shell experiment show in the listing below, I set up a few Tasks and
a couple of DayOfWeekSchedules and then add “first task” and “second
task” to one of the schedules. Once this is done, I attempt to add “first
task” to the schedule again and we see that it does not have the desired
effect.
>>> # import our models >>> from tasks.models import Task, DayOfWeekSchedule >>> >>> # populate our database with some simple tasks and schedules >>> Task.objects.create(name="first task", required=False) <Task: first task> >>> Task.objects.create(name="second task", required=True) <Task: second task> >>> Task.objects.create(name="third task", required=False) <Task: third task> >>> DayOfWeekSchedule.objects.create(name="sched1") <DayOfWeekSchedule: sched1> >>> DayOfWeekSchedule.objects.create(name="sched2") <DayOfWeekSchedule: sched2> >>> Task.objects.all() <QuerySet [<Task: first task>, <Task: second task>, <Task: third task>]> >>> DayOfWeekSchedule.objects.all() <QuerySet [<DayOfWeekSchedule: sched1>, <DayOfWeekSchedule: sched2>]> >>> >>> # add a task to a schedule >>> s = DayOfWeekSchedule.objects.get(name='sched2') >>> t = Task.objects.get(name='first task') >>> s.tasks.add(t) >>> s.tasks.all() <QuerySet [<Task: first task>]> >>> >>> # add other task to that schedule >>> t = Task.objects.get(name='second task') >>> s.tasks.add(t) >>> s.tasks.all() <QuerySet [<Task: first task>, <Task: second task>]> >>> >>> # attempt to add the first task to the schedule again >>> s = DayOfWeekSchedule.objects.get(name='sched2') >>> t = Task.objects.get(name='first task') >>> s.tasks.add(t) >>> s.tasks.all() <QuerySet [<Task: first task>, <Task: second task>]>
Note that at the end, we still only have a single copy of “first task” in the schedule.
Many To Many with an Intermediate Class
Now we’ll retry the experiment with the “through=” intermediate class specified in the ManyToMany relationship.
Not-Quite-As-Simple Solution Code
The model code for this is quite similar. Note the addition of the “through=” option and of the DayTask class.
from django.db import models class Task(models.Model): name = models.CharField(max_length=256) required = models.BooleanField() def __str__(self): return self.name class DayOfWeekSchedule(models.Model): tasks = models.ManyToManyField(Task, through='DayTask') name = models.CharField(max_length=20) def __str__(self): return self.name class DayTask(models.Model): task = models.ForeignKey(Task) schedule = models.ForeignKey(DayOfWeekSchedule)
Experiment #2
This script is as close as possible to the first set. The only difference being the extra steps we need to take to add the ManyToMany relationship. We need to manually create the object of DayTask, initializing it with the Task and Schedule objects and then saving it. While this is slightly more cumbersome in the code, it does produce the desired results; two copies of “first task” are present in the schedule at the end.
>>> # import our models >>> from tasks.models import Task, DayOfWeekSchedule, DayTask >>> >>> # populate our database with some simple tasks and schedules >>> Task.objects.create(name="first task", required=False) <Task: first task> >>> Task.objects.create(name="second task", required=True) <Task: second task> >>> Task.objects.create(name="third task", required=False) <Task: third task> >>> DayOfWeekSchedule.objects.create(name="sched1") <DayOfWeekSchedule: sched1> >>> DayOfWeekSchedule.objects.create(name="sched2") <DayOfWeekSchedule: sched2> >>> Task.objects.all() <QuerySet [<Task: first task>, <Task: second task>, <Task: third task>]> >>> DayOfWeekSchedule.objects.all() <QuerySet [<DayOfWeekSchedule: sched1>, <DayOfWeekSchedule: sched2>]> >>> >>> # add a task to a schedule >>> s = DayOfWeekSchedule.objects.get(name='sched2') >>> t = Task.objects.get(name='first task') >>> # cannot simply add directly, must create intermediate object see >>> # https://docs.djangoproject.com/en/1.9/topics/db/models/#extra-fields-on-many-to-many-relationships >>> # s.tasks.add(t) >>> d1 = DayTask(task=t, schedule=s) >>> d1.save() >>> s.tasks.all() <QuerySet [<Task: first task>]> >>> >>> # add other task to that schedule >>> t = Task.objects.get(name='second task') >>> dt2 = DayTask(task=t, schedule=s) >>> dt2.save() >>> # s.tasks.add(t) >>> s.tasks.all() <QuerySet [<Task: first task>, <Task: second task>]> >>> >>> # attempt to add the first task to the schedule again >>> s = DayOfWeekSchedule.objects.get(name='sched2') >>> t = Task.objects.get(name='first task') >>> dt3 = DayTask(task=t, schedule=s) >>> dt3.save() >>> s.tasks.all() <QuerySet [<Task: first task>, <Task: second task>, <Task: first task>]>
But…Why?
The short answer is that I’m not entirely sure why the intermediate class is needed to allow multiple instances. It’s fairly clear that it is tied to how the Django code manages those relationships. Evidence confirming that can be seen in the migration script generated for each of the models.
The first model generates these operations:
operations = [ migrations.CreateModel( name='DayOfWeekSchedule', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('name', models.CharField(max_length=20)), ], ), migrations.CreateModel( name='Task', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('name', models.CharField(max_length=256)), ('required', models.BooleanField()), ], ), migrations.AddField( model_name='dayofweekschedule', name='tasks', field=models.ManyToManyField(to='tasks.Task'), ), ]
Notice the final AddField call which adds “tasks” to the “dayofweekschedule” model directly.
The second model (shown above) generates a slightly different set of migration operations:
operations = [ migrations.CreateModel( name='DayOfWeekSchedule', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('name', models.CharField(max_length=20)), ], ), migrations.CreateModel( name='DayTask', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('schedule', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='tasks.DayOfWeekSchedule')), ], ), migrations.CreateModel( name='Task', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('name', models.CharField(max_length=256)), ('required', models.BooleanField()), ], ), migrations.AddField( model_name='daytask', name='task', field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='tasks.Task'), ), migrations.AddField( model_name='dayofweekschedule', name='tasks', field=models.ManyToManyField(through='tasks.DayTask', to='tasks.Task'), ), ]
This time it adds task to the daytask and dayofweekschedule classes. I have to admit here that I really wanted this to show the DayTask object being used in the DayOfWeekSchedule class as a proxy, but that’s not the case.
Examining the databases generated by these two models showed no significant differences there, either.
A Quick Look at the Source
One of the beauties of working with open source software is the ability to dive in and see for yourself what’s going on. Looking at the Django source, you can find the code that adds a relationship in django/db/models/fields/related_descriptors.py (at line 918 in the version I checked out).
def add(self, *objs): ... stuff deleted ... self._add_items(self.source_field_name, self.target_field_name, *objs)
(actually _add_items can be called twice, once for a forward and once for a reverse relationship). Looking at _add_items (line 1041 in my copy), we see after building the list of new_ids to insert, this chunk of code:
db = router.db_for_write(self.through, instance=self.instance) vals = (self.through._default_manager.using(db) .values_list(target_field_name, flat=True) .filter(**{ source_field_name: self.related_val[0], '%s__in' % target_field_name: new_ids, })) new_ids = new_ids - set(vals)
which I suspect of providing the difference. This code gets the list of current values in the relation table and removes that set from the set of new_ids. I believe that the filter here will respond differently if we have a intermediate class defined. NOTE: I did not run this code live to test this theory, so if I’m wrong, feel free to point out how and where in the comments.
Even if this is not quite correct, after walking through some code, I’m satisfied that the intermediate class definitely causes some different behavior internally in Django.
Next time I’ll jump back into the KidsTasks code.
Thank for reading!