You need to make a copy of an object in a Python program. How difficult can it be? Not very. But you also need to know the difference between shallow and deep copy in Python and decide which one you need.
In this article, you’ll read about the difference between shallow and deep copy when used on simple data structures. Then, you’ll look at more complex structures, including when copying an object created from a class you define yourself. In this example in which I’ll be cloning myself (!), you’ll see some of the pitfalls of copying objects and how to look out for them and avoid them.
In this article, you’ll learn more about:
- Creating copies of simple lists and other data structures
- Creating copies of more complex lists
- Using the
copy
built-in module - Understanding the difference between shallow and deep copy in Python
- Using
__copy__()
to define how to shallow copy an object of a user-defined class
Yes, there’s also __deepcopy__()
, but I’ll stop at __copy__()
in this article.
What’s The Problem With Copying Objects?
Here’s a preview of the example you’ll write towards the end of this article. You’ll create a couple of simple classes to define a Person
and a Car
. Yes, I’m afraid it’s “person” and “car” again. You’ve seen these examples used very often in object-oriented programming tutorials. But it’s a bit different in this case, so bear with me, please.
If you want a tutorial about classes that doesn’t use the “same old classes” that every other tutorial does, you can read the chapter about Object-Oriented Programming in Python in The Python Coding Book:
# household.py class Car: def __init__(self, make: str, model: str): self.make = make self.model = model self.mileage = 0 def add_mileage(self, miles: float): self.mileage += miles class Person: def __init__(self, firstname: str): self.firstname = firstname self.car = None def buy_car(self, car: Car): self.car = car def drive(self, miles: float): self.car.add_mileage(miles)
You’ll walk through this example in a bit more detail later. For now, Here, I’ll highlight that the Car
module has make
, model
, and mileage
attributes. The latter can be updated using the add_mileage()
method.
Person
has attributes firstname
and car
. You can assign an object of type Car
to the Person
using buy_car()
, and you can get the person to go for a drive using drive()
, which adds mileage to the car.
You can use these classes in a new script:
# cloning_stephen.py from household import Car, Person # Create a person who buys a car stephen = Person("Stephen") stephen.buy_car( Car("BMW", "Series 1") ) # Log how many miles driven stephen.drive(100) print(f"Stephen's mileage is {stephen.car.mileage} miles")
The output from the print()
line is:
Stephen's mileage is 100 miles
Next, you’ll clone Stephen (as if one of me is not enough already!)
# cloning_stephen.py import copy from household import Car, Person # Create a person who buys a car stephen = Person("Stephen") stephen.buy_car( Car("BMW", "Series 1") ) # Log how many miles driven stephen.drive(100) print(f"Stephen's mileage is {stephen.car.mileage} miles") # Let's copy the Person instance clone = copy.copy(stephen) print( f"The clone's car is a {clone.car.make} {clone.car.model}" ) print(f"The clone's mileage is {clone.car.mileage} miles") # Let's check whether the two cars are exactly the same car print( f"Stephen's car is clone's car: {stephen.car is clone.car}" )
And here’s where the problem lies. Look at the output from this code:
Stephen's mileage is 100 miles The clone's car is a BMW Series 1 The clone's mileage is 100 miles Stephen's car is clone's car: True
The clone’s car is also a BMW Series 1, which makes sense. The clone has the same tastes and needs as Stephen! But, the clone’s car starts at 100 miles. Even though you’ve just created the clone and he’s not been on a drive yet.
The final line explains what’s happening. Stephen and the clone have the same car. Not just the same make and model, but the exact same car.
If the clone goes for a drive now, Stephen’s mileage will also change. Here’s what will happen if you add the following lines to the end of cloning_stephen.py
:
# cloning_stephen.py # ... # Clone goes for a drive: clone.drive(68) print(f"Stephen's mileage is {stephen.car.mileage} miles")
The output is:
Stephen's mileage is 168 miles
Stephen’s mileage increased by 68 miles even though it’s the clone who went for a drive. That’s because they are using the same car! It’s unlikely this is the behaviour you want when you create a copy of a Person
.
You’ll return to this example a bit later.
Making a Copy of Simple Data Structures
I’ll go through this section quickly as the fun starts in the next section. Let’s copy a list and a dictionary:
>>> trip_mileages = [10, 12, 3, 59] >>> copied_list = trip_mileages.copy() >>> copied_list [10, 12, 3, 59] >>> copied_list is trip_mileages False >>> trips = { ... "Supermarket": 2, ... "Holiday": 129, ... } >>> copied_dict = trips.copy() >>> copied_dict {'Supermarket': 2, 'Holiday': 129} >>> copied_dict is trips False
Both lists and dictionaries have a .copy()
method. This makes life easy to copy them to create a new object containing the same information.
What if you have a tuple?
>>> trip_mileages_tuple = 10, 12, 3, 59 >>> trip_mileages_tuple.copy() Traceback (most recent call last): ... AttributeError: 'tuple' object has no attribute 'copy'
Tuples don’t have a .copy()
method. In this case, you can try to use the copy
built-in module:
>>> trip_mileages_tuple = 10, 12, 3, 59 >>> import copy >>> copied_tuple = copy.copy(trip_mileages_tuple) >>> copied_tuple (10, 12, 3, 59) >>> copied_tuple is trip_mileages_tuple True
You’ve been able to create a “copy” of a tuple, except it’s not a copy at all! As tuples are immutable, when you try to copy the tuple, you get a new reference to the same tuple.
You may be wondering whether this is also the case if you use copy.copy()
with mutable types such as lists and dictionaries:
>>> trip_mileages = [10, 12, 3, 59] >>> import copy >>> copied_list = copy.copy(trip_mileages) >>> copied_list [10, 12, 3, 59] >>> copied_list is trip_mileages False
No, in this case, copy.copy(trip_mileages)
gives the same output as trip_mileages.copy()
. You’ll see later on what determines how copy.copy()
behaves on any object. But first, let’s look at more complex data structures and find out about shallow and deep copies.
Making a Copy of Complex Data Structures
Consider a list of teams, where each team is a list of names. You create a copy of the list of teams:
>>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]] >>> copied_teams = teams.copy() >>> copied_teams [['Stephen', 'Mary'], ['Kate', 'Trevor']] >>> copied_teams is teams False
So far, this is the same result as the one in the previous section. But, Martin joins Stephen and Mary’s team. You choose to add this to the copied list as you’d like to keep the original teams
list unchanged:
>>> copied_teams[0].append("Martin") >>> copied_teams [['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']] >>> teams [['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']] >>> copied_teams[0] is teams[0] True
You add Martin to the first team in copied_teams
. However, he was also added to the first team in teams
, the original list, even though you didn’t append anything explicitly to it.
You can see why this happens in the last statement in which you’re checking whether the first list in copied_teams
is the same object as the first list in teams
. Yes, they are both the same object.
Creating Shallow and Deep Copies in Python
When you copied the list using teams.copy()
, you created a shallow copy of the list. Let’s see what this means.
When you create a list, you’re creating a new object of type list which contains several items. However, the list actually contains references to other objects that are stored elsewhere. Therefore, teams[0]
is a reference to another object, the list: ['Stephen', 'Mary']
. Look again at the line you used to create the teams
list initially:
>>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]]
This line creates three lists:
- The list
['Stephen', 'Mary']
- The list
['Kate', 'Trevor']
- The list named
teams
which has references to the other two lists
You can visualise this using the diagram below:
When you use teams.copy()
or copy.copy(teams)
, you’re creating a new outer list. However, you’re not copying the inner lists. Instead, you use the same lists ['Stephen', 'Mary']
and ['Kate', 'Trevor']
you already have. Here’s a representation of what this looks like:
teams[0]
and copied_teams[0]
are two references pointing to the same list. You have two ways of referring to the same object.
So, when you add Martin to the copied_teams[0]
, you are adding Martin’s name to the only existing list which has the Stephen’s team members’ names.
Sometimes, this is not what you want. Instead, you want to create a copy of all the items inside objects.
Deep Copy
In this section, you’ll read about creating a deep copy of an object. But first, let’s recreate the example above using the functions in the built-in module copy
.
copy.copy()
creates a shallow copy, so you’ll get the same output as the one in the section above:
>>> import copy >>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]] >>> copied_teams = copy.copy(teams) >>> copied_teams[0].append("Martin") >>> copied_teams [['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']] >>> teams [['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']] >>> copied_teams[0] is teams[0] True
Therefore for lists, copy.copy(teams)
is the same as teams.copy()
.
Next, you can try using copy.deepcopy()
instead:
>>> import copy >>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]] >>> deepcopied_teams = copy.deepcopy(teams) >>> deepcopied_teams [['Stephen', 'Mary'], ['Kate', 'Trevor']] >>> deepcopied_teams[0].append("Martin") >>> deepcopied_teams [['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']] >>> teams [['Stephen', 'Mary'], ['Kate', 'Trevor']] >>> deepcopied_teams[0] is teams[0] False
When you append "Martin"
to deepcopied_teams
, which is the deep copy you created from the original list, the new item does not appear when you display teams
. And unlike the case with the shallow copy earlier, deepcopied_teams[0]
is no longer the same object as teams[0]
.
When you create a deep copy, you’re copying the outer list, but you’re also creating copies of the inner lists. Therefore, the references in teams
and those in deepcopied_teams
point to different objects. The two copies created by deepcopy()
are entirely separate from each other. Here’s how this representation looks now:
You can read more about shallow and deep copy in Python in the official documentation.
Copying Objects of Classes You’ve Defined Yourself
It’s time to create your own classes and explore what happens when you make copies of them. You’ve already come across the class definitions Car
and Person
at the beginning of this article. Let’s introduce these classes properly. You can define them in a script called household.py
:
# household.py class Car: def __init__(self, make: str, model: str): self.make = make self.model = model self.mileage = 0 def add_mileage(self, miles: float): self.mileage += miles class Person: def __init__(self, firstname: str): self.firstname = firstname self.car = None def buy_car(self, car: Car): self.car = car def drive(self, miles: float): self.car.add_mileage(miles)
You can initialise Car
with a make
and a model
, both of which are strings. I’m using type hinting in this example to keep track of what the argument types are. A new car starts with a mileage
of 0 miles (or kilometres, if you prefer).
And as the name implies, the method add_mileage()
is used to add miles whenever the person drives the car.
A Person
is initialised with a first name which is a string. The method buy_car()
allows you to link an instance of the class Car
to an instance of Person
. The Car
object is referenced using the attribute Person.car
.
Whenever the person goes on a trip, you can call the drive()
method which logs the additional miles onto the person’s car.
In a new script called cloning_stephen.py
, you can test these classes:
# cloning_stephen.py from household import Car, Person # Create a person who buys a car stephen = Person("Stephen") stephen.buy_car( Car("BMW", "Series 1") ) # Log how many miles driven stephen.drive(100) print(f"Stephen's mileage is {stephen.car.mileage} miles")
This is the same code you saw earlier. You create an instance of Person
and call the buy_car()
method for that instance. Stephen (I’m still talking about myself in the third person!) goes for a 100-mile drive. You log this by calling the drive()
method. This updates the mileage
attribute of the Car
instance referenced in stephen.car
. This code gives the following output:
Stephen's mileage is 100 miles
Copying An Object: The Default Case
Stephen is very busy these days! He decides to clone himself so he can get more things done. Let’s try this. You can copy the instance stephen
in cloning_stephen.py
using the built-in copy.copy()
:
# cloning_stephen.py import copy from household import Car, Person # Create a person who buys a car stephen = Person("Stephen") stephen.buy_car( Car("BMW", "Series 1") ) # Log how many miles driven stephen.drive(100) print(f"Stephen's mileage is {stephen.car.mileage} miles") # Let's copy the Person instance clone = copy.copy(stephen) print( f"The clone's car is a {clone.car.make} {clone.car.model}" ) print(f"The clone's mileage is {clone.car.mileage} miles") # Let's check whether the two cars are exactly the same car print( f"Stephen's car is clone's car: {stephen.car is clone.car}" )
The outputs from this script, which you’ve already seen earlier, show the problem with this type of copy:
Stephen's mileage is 100 miles The clone's car is a BMW Series 1 The clone's mileage is 100 miles Stephen's car is clone's car: True
This is a shallow copy. Therefore, although stephen
and clone
are different instances of the class Person
, they both share the same instance of Car
. Stephen has managed to clone himself, but he has to share the same car with his clone. That’s not good, as Stephen and the clone can’t be efficient if they can’t go to different places.
If the clone goes for a drive, he’s using the same car Stephen uses. Therefore the extra mileage will also show up for Stephen:
# cloning_stephen.py # ... # Clone goes for a drive: clone.drive(68) print(f"Stephen's mileage is {stephen.car.mileage} miles")
This shows Stephen’s mileage has increased to 168 miles:
Stephen's mileage is 168 miles
Using copy.deepcopy()
What if you try to create a deep copy instead of a shallow one? After all, this trick worked with the example of the list of team members earlier. You can update cloning_stephen.py
to use copy.deepcopy()
instead of copy.copy()
:
# cloning_stephen.py import copy from household import Car, Person # Create a person who buys a car stephen = Person("Stephen") stephen.buy_car( Car("BMW", "Series 1") ) # Log how many miles driven stephen.drive(100) print(f"Stephen's mileage is {stephen.car.mileage} miles") # Let's copy the Person instance clone = copy.deepcopy(stephen) print( f"The clone's car is a {clone.car.make} {clone.car.model}" ) print(f"The clone's mileage is {clone.car.mileage} miles") # Let's check whether the two cars are exactly the same car print( f"Stephen's car is clone's car: {stephen.car is clone.car}" )
When you run this script, you’ll now get the following output:
Stephen's mileage is 100 miles The clone's car is a BMW Series 1 The clone's mileage is 100 miles Stephen's car is clone's car: False
Stephen’s mileage is still 100 miles. There’s no reason why this should be different as Stephen drove 100 miles.
The clone’s car is a BMW Series 1, the same as Stephen’s car make and model. This is what you want since Stephen’s clone has the same car preferences as Stephen!
Let’s skip to the last line of the output. Stephen’s car is no longer the exact same car as the clone’s car. This is different from the result you got with the shallow copy above. The clone’s car is a different instance of Car
. So there are two cars now; one belongs to Stephen and the other to the clone.
However, the clone’s car already has 100 miles on the odometer even though the clone hasn’t driven yet. When you create a deep copy of stephen
, the program creates a new instance of Car
. However, all of the original car attributes are also copied. This means the clone’s car starts with whatever mileage Stephen’s car has when you create the deep copy.
From now on, the two cars are separate, so when the clone drives the car, the additional mileage won’t show up in Stephen’s car:
# cloning_stephen.py # ... # Clone goes for a drive: clone.drive(68) print(f"Stephen's mileage is {stephen.car.mileage} miles") print(f"The clone's mileage is {clone.car.mileage} miles")
The output shows that Stephen’s mileage is still 100 miles, but the clone’s mileage is now 168 miles even though his one and only only trip is 68 miles long:
... Stephen's mileage is 100 miles The clone's mileage is 168 miles
In the last section of this article, you’ll fix this to customise how an instance of Person
should be copied.
Defining The __copy__
Dunder Method
You can override the default behaviour for copy.copy()
and copy.deepcopy()
for any class you define. In this article, I’ll only focus on defining the dunder method __copy__()
, which determines what happens when you call copy.copy()
for your object. There’s also a __deepcopy__()
dunder method, aimed at creating deep copies, which is similar but provides a bit more functionality to deal with complex objects.
You can return to household.py
where you define the class Person
and add __copy__()
to the class:
# household.py class Car: def __init__(self, make: str, model: str): self.make = make self.model = model self.mileage = 0 def add_mileage(self, miles: float): self.mileage += miles class Person: def __init__(self, firstname: str): self.firstname = firstname self.car = None def buy_car(self, car: Car): self.car = car def drive(self, miles: float): self.car.add_mileage(miles) def __copy__(self): copy_instance = Person(self.firstname) copy_instance.buy_car( Car( make=self.car.make, model=self.car.model, ) ) return copy_instance
The __copy__()
dunder method creates a new Person
instance using the same first name of the instance you’re copying. It also creates a new Car
instance using the make and model of the car you’re copying. You pass this new Car
object as an argument in copy_instance.buy_car()
and then return the new Person
instance.
You can return to cloning_stephen.py
, making sure you use copy.copy()
to make a copy of stephen
. This means that Person.__copy__()
is used when creating the copy.
# cloning_stephen.py import copy from household import Car, Person # Create a person who buys a car stephen = Person("Stephen") stephen.buy_car( Car("BMW", "Series 1") ) # Log how many miles driven stephen.drive(100) print(f"Stephen's mileage is {stephen.car.mileage} miles") # Let's copy the Person instance clone = copy.copy(stephen) print( f"The clone's car is a {clone.car.make} {clone.car.model}" ) print(f"The clone's mileage is {clone.car.mileage} miles") # Let's check whether the two cars are exactly the same car print( f"Stephen's car is clone's car: {stephen.car is clone.car}" )
Now, the output is:
Stephen's mileage is 100 miles The clone's car is a BMW Series 1 The clone's mileage is 0 miles Stephen's car is clone's car: False
The clone still has a different instance of Car
but now, the car’s mileage starts at 0, as you’d expect! You’ve created a custom version of shallow copy by defining __copy__()
for the class. In this case, you decided that when you copy a Person
, the new instance has its own car which starts with 0 miles.
In more complex classes, you may want to define both __copy__()
and __deepcopy__()
if you want to distinguish between shallow and deep copy in your Python program.
Final Words
Here’s a summary of the key points you covered in this article:
- You created copies of simple lists and other data structures
- You created copies of more complex lists
- You used the
copy
built-in module - You learnt about the difference between shallow and deep copy in Python
- You used
__copy__()
to define how to shallow copy an object of a user-defined class
You’re now ready to safely copy any object, knowing what to look out for if the object references other objects.
Appendix: You Cannot Copy An Immutable Object
Do you recall when you used copy.copy()
on a tuple earlier in the article? Unlike when you copy lists and dictionaries, where you got a new instance containing the same values as the original, you got the same instance back when you try to copy a tuple.
Whenever you pass an immutable object to copy.copy()
, it returns the object itself.
Further Reading
You can read more about Object-Oriented Programming here:
- The chapter about Object-Oriented Programming in Python in The Python Coding Book
- Blog posts which use and talk about OOP in Python
Get the latest blog updates
No spam promise. You’ll get an email when a new blog post is published
The post Shallow and Deep Copy in Python and How to Use __copy__() appeared first on The Python Coding Book.