Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22466

The Python Coding Blog: Shallow and Deep Copy in Python and How to Use __copy__()

$
0
0

You need to make a copy of an object in a Python program. How difficult can it be? Not very. But you also need to know the difference between shallow and deep copy in Python and decide which one you need.

In this article, you’ll read about the difference between shallow and deep copy when used on simple data structures. Then, you’ll look at more complex structures, including when copying an object created from a class you define yourself. In this example in which I’ll be cloning myself (!), you’ll see some of the pitfalls of copying objects and how to look out for them and avoid them.

In this article, you’ll learn more about:

  • Creating copies of simple lists and other data structures
  • Creating copies of more complex lists
  • Using the copy built-in module
  • Understanding the difference between shallow and deep copy in Python
  • Using __copy__() to define how to shallow copy an object of a user-defined class

Yes, there’s also __deepcopy__(), but I’ll stop at __copy__() in this article.

What’s The Problem With Copying Objects?

Here’s a preview of the example you’ll write towards the end of this article. You’ll create a couple of simple classes to define a Person and a Car. Yes, I’m afraid it’s “person” and “car” again. You’ve seen these examples used very often in object-oriented programming tutorials. But it’s a bit different in this case, so bear with me, please.

If you want a tutorial about classes that doesn’t use the “same old classes” that every other tutorial does, you can read the chapter about Object-Oriented Programming in Python in The Python Coding Book:

# household.py

class Car:
    def __init__(self, make: str, model: str):
        self.make = make
        self.model = model
        self.mileage = 0

    def add_mileage(self, miles: float):
        self.mileage += miles

class Person:
    def __init__(self, firstname: str):
        self.firstname = firstname
        self.car = None

    def buy_car(self, car: Car):
        self.car = car

    def drive(self, miles: float):
        self.car.add_mileage(miles)

You’ll walk through this example in a bit more detail later. For now, Here, I’ll highlight that the Car module has make, model, and mileage attributes. The latter can be updated using the add_mileage() method.

Person has attributes firstname and car. You can assign an object of type Car to the Person using buy_car(), and you can get the person to go for a drive using drive(), which adds mileage to the car.

You can use these classes in a new script:

# cloning_stephen.py

from household import Car, Person

# Create a person who buys a car
stephen = Person("Stephen")
stephen.buy_car(
    Car("BMW", "Series 1")
)

# Log how many miles driven
stephen.drive(100)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

The output from the print() line is:

Stephen's mileage is 100 miles

Next, you’ll clone Stephen (as if one of me is not enough already!)

# cloning_stephen.py

import copy

from household import Car, Person

# Create a person who buys a car
stephen = Person("Stephen")
stephen.buy_car(
    Car("BMW", "Series 1")
)

# Log how many miles driven
stephen.drive(100)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

# Let's copy the Person instance
clone = copy.copy(stephen)

print(
    f"The clone's car is a {clone.car.make} {clone.car.model}"
)

print(f"The clone's mileage is {clone.car.mileage} miles")

# Let's check whether the two cars are exactly the same car
print(
    f"Stephen's car is clone's car: {stephen.car is clone.car}"
)

And here’s where the problem lies. Look at the output from this code:

Stephen's mileage is 100 miles
The clone's car is a BMW Series 1
The clone's mileage is 100 miles
Stephen's car is clone's car: True

The clone’s car is also a BMW Series 1, which makes sense. The clone has the same tastes and needs as Stephen! But, the clone’s car starts at 100 miles. Even though you’ve just created the clone and he’s not been on a drive yet.

The final line explains what’s happening. Stephen and the clone have the same car. Not just the same make and model, but the exact same car.

If the clone goes for a drive now, Stephen’s mileage will also change. Here’s what will happen if you add the following lines to the end of cloning_stephen.py:

# cloning_stephen.py

# ...

# Clone goes for a drive:
clone.drive(68)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

The output is:

Stephen's mileage is 168 miles

Stephen’s mileage increased by 68 miles even though it’s the clone who went for a drive. That’s because they are using the same car! It’s unlikely this is the behaviour you want when you create a copy of a Person.

You’ll return to this example a bit later.

Making a Copy of Simple Data Structures

I’ll go through this section quickly as the fun starts in the next section. Let’s copy a list and a dictionary:

>>> trip_mileages = [10, 12, 3, 59]
>>> copied_list = trip_mileages.copy()
>>> copied_list
[10, 12, 3, 59]
>>> copied_list is trip_mileages
False

>>> trips = {
...     "Supermarket": 2,
...     "Holiday": 129,
... }
>>> copied_dict = trips.copy()
>>> copied_dict
{'Supermarket': 2, 'Holiday': 129}
>>> copied_dict is trips
False

Both lists and dictionaries have a .copy() method. This makes life easy to copy them to create a new object containing the same information.

What if you have a tuple?

>>> trip_mileages_tuple = 10, 12, 3, 59
>>> trip_mileages_tuple.copy()
Traceback (most recent call last):
  ...
AttributeError: 'tuple' object has no attribute 'copy'

Tuples don’t have a .copy() method. In this case, you can try to use the copy built-in module:

>>> trip_mileages_tuple = 10, 12, 3, 59

>>> import copy

>>> copied_tuple = copy.copy(trip_mileages_tuple)
>>> copied_tuple
(10, 12, 3, 59)
>>> copied_tuple is trip_mileages_tuple
True

You’ve been able to create a “copy” of a tuple, except it’s not a copy at all! As tuples are immutable, when you try to copy the tuple, you get a new reference to the same tuple.

You may be wondering whether this is also the case if you use copy.copy() with mutable types such as lists and dictionaries:

>>> trip_mileages = [10, 12, 3, 59]

>>> import copy

>>> copied_list = copy.copy(trip_mileages)
>>> copied_list
[10, 12, 3, 59]
>>> copied_list is trip_mileages
False

No, in this case, copy.copy(trip_mileages) gives the same output as trip_mileages.copy(). You’ll see later on what determines how copy.copy() behaves on any object. But first, let’s look at more complex data structures and find out about shallow and deep copies.

Making a Copy of Complex Data Structures

Consider a list of teams, where each team is a list of names. You create a copy of the list of teams:

>>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]]
>>> copied_teams = teams.copy()
>>> copied_teams
[['Stephen', 'Mary'], ['Kate', 'Trevor']]
>>> copied_teams is teams
False

So far, this is the same result as the one in the previous section. But, Martin joins Stephen and Mary’s team. You choose to add this to the copied list as you’d like to keep the original teams list unchanged:

>>> copied_teams[0].append("Martin")
>>> copied_teams
[['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']]

>>> teams
[['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']]

>>> copied_teams[0] is teams[0]
True

You add Martin to the first team in copied_teams. However, he was also added to the first team in teams, the original list, even though you didn’t append anything explicitly to it.

You can see why this happens in the last statement in which you’re checking whether the first list in copied_teams is the same object as the first list in teams. Yes, they are both the same object.

Creating Shallow and Deep Copies in Python

When you copied the list using teams.copy(), you created a shallow copy of the list. Let’s see what this means.

When you create a list, you’re creating a new object of type list which contains several items. However, the list actually contains references to other objects that are stored elsewhere. Therefore, teams[0] is a reference to another object, the list: ['Stephen', 'Mary']. Look again at the line you used to create the teams list initially:

>>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]]

This line creates three lists:

  • The list ['Stephen', 'Mary']
  • The list ['Kate', 'Trevor']
  • The list named teams which has references to the other two lists

You can visualise this using the diagram below:

When you use teams.copy() or copy.copy(teams), you’re creating a new outer list. However, you’re not copying the inner lists. Instead, you use the same lists ['Stephen', 'Mary'] and ['Kate', 'Trevor'] you already have. Here’s a representation of what this looks like:

teams[0] and copied_teams[0] are two references pointing to the same list. You have two ways of referring to the same object.

So, when you add Martin to the copied_teams[0], you are adding Martin’s name to the only existing list which has the Stephen’s team members’ names.

Sometimes, this is not what you want. Instead, you want to create a copy of all the items inside objects.

Deep Copy

In this section, you’ll read about creating a deep copy of an object. But first, let’s recreate the example above using the functions in the built-in module copy.

copy.copy() creates a shallow copy, so you’ll get the same output as the one in the section above:

>>> import copy
>>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]]

>>> copied_teams = copy.copy(teams)

>>> copied_teams[0].append("Martin")
>>> copied_teams
[['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']]
>>> teams
[['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']]

>>> copied_teams[0] is teams[0]
True

Therefore for lists, copy.copy(teams) is the same as teams.copy().

Next, you can try using copy.deepcopy() instead:

>>> import copy
>>> teams = [["Stephen", "Mary"], ["Kate", "Trevor"]]

>>> deepcopied_teams = copy.deepcopy(teams)
>>> deepcopied_teams
[['Stephen', 'Mary'], ['Kate', 'Trevor']]

>>> deepcopied_teams[0].append("Martin")
>>> deepcopied_teams
[['Stephen', 'Mary', 'Martin'], ['Kate', 'Trevor']]
>>> teams
[['Stephen', 'Mary'], ['Kate', 'Trevor']]

>>> deepcopied_teams[0] is teams[0]
False

When you append "Martin" to deepcopied_teams, which is the deep copy you created from the original list, the new item does not appear when you display teams. And unlike the case with the shallow copy earlier, deepcopied_teams[0] is no longer the same object as teams[0].

When you create a deep copy, you’re copying the outer list, but you’re also creating copies of the inner lists. Therefore, the references in teams and those in deepcopied_teams point to different objects. The two copies created by deepcopy() are entirely separate from each other. Here’s how this representation looks now:

You can read more about shallow and deep copy in Python in the official documentation.

Copying Objects of Classes You’ve Defined Yourself

It’s time to create your own classes and explore what happens when you make copies of them. You’ve already come across the class definitions Car and Person at the beginning of this article. Let’s introduce these classes properly. You can define them in a script called household.py:

# household.py

class Car:
    def __init__(self, make: str, model: str):
        self.make = make
        self.model = model
        self.mileage = 0

    def add_mileage(self, miles: float):
        self.mileage += miles

class Person:
    def __init__(self, firstname: str):
        self.firstname = firstname
        self.car = None

    def buy_car(self, car: Car):
        self.car = car

    def drive(self, miles: float):
        self.car.add_mileage(miles)

You can initialise Car with a make and a model, both of which are strings. I’m using type hinting in this example to keep track of what the argument types are. A new car starts with a mileage of 0 miles (or kilometres, if you prefer).

And as the name implies, the method add_mileage() is used to add miles whenever the person drives the car.

A Person is initialised with a first name which is a string. The method buy_car() allows you to link an instance of the class Car to an instance of Person. The Car object is referenced using the attribute Person.car.

Whenever the person goes on a trip, you can call the drive() method which logs the additional miles onto the person’s car.

In a new script called cloning_stephen.py, you can test these classes:

# cloning_stephen.py

from household import Car, Person

# Create a person who buys a car
stephen = Person("Stephen")
stephen.buy_car(
    Car("BMW", "Series 1")
)

# Log how many miles driven
stephen.drive(100)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

This is the same code you saw earlier. You create an instance of Person and call the buy_car() method for that instance. Stephen (I’m still talking about myself in the third person!) goes for a 100-mile drive. You log this by calling the drive() method. This updates the mileage attribute of the Car instance referenced in stephen.car. This code gives the following output:

Stephen's mileage is 100 miles

Copying An Object: The Default Case

Stephen is very busy these days! He decides to clone himself so he can get more things done. Let’s try this. You can copy the instance stephen in cloning_stephen.py using the built-in copy.copy():

# cloning_stephen.py

import copy

from household import Car, Person

# Create a person who buys a car
stephen = Person("Stephen")
stephen.buy_car(
    Car("BMW", "Series 1")
)

# Log how many miles driven
stephen.drive(100)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

# Let's copy the Person instance
clone = copy.copy(stephen)

print(
    f"The clone's car is a {clone.car.make} {clone.car.model}"
)

print(f"The clone's mileage is {clone.car.mileage} miles")

# Let's check whether the two cars are exactly the same car
print(
    f"Stephen's car is clone's car: {stephen.car is clone.car}"
)

The outputs from this script, which you’ve already seen earlier, show the problem with this type of copy:

Stephen's mileage is 100 miles
The clone's car is a BMW Series 1
The clone's mileage is 100 miles
Stephen's car is clone's car: True

This is a shallow copy. Therefore, although stephen and clone are different instances of the class Person, they both share the same instance of Car. Stephen has managed to clone himself, but he has to share the same car with his clone. That’s not good, as Stephen and the clone can’t be efficient if they can’t go to different places.

If the clone goes for a drive, he’s using the same car Stephen uses. Therefore the extra mileage will also show up for Stephen:

# cloning_stephen.py

# ...

# Clone goes for a drive:
clone.drive(68)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

This shows Stephen’s mileage has increased to 168 miles:

Stephen's mileage is 168 miles

Using copy.deepcopy()

What if you try to create a deep copy instead of a shallow one? After all, this trick worked with the example of the list of team members earlier. You can update cloning_stephen.py to use copy.deepcopy() instead of copy.copy():

# cloning_stephen.py

import copy

from household import Car, Person

# Create a person who buys a car
stephen = Person("Stephen")
stephen.buy_car(
    Car("BMW", "Series 1")
)

# Log how many miles driven
stephen.drive(100)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

# Let's copy the Person instance
clone = copy.deepcopy(stephen)

print(
    f"The clone's car is a {clone.car.make} {clone.car.model}"
)

print(f"The clone's mileage is {clone.car.mileage} miles")

# Let's check whether the two cars are exactly the same car
print(
    f"Stephen's car is clone's car: {stephen.car is clone.car}"
)

When you run this script, you’ll now get the following output:

Stephen's mileage is 100 miles
The clone's car is a BMW Series 1
The clone's mileage is 100 miles
Stephen's car is clone's car: False

Stephen’s mileage is still 100 miles. There’s no reason why this should be different as Stephen drove 100 miles.

The clone’s car is a BMW Series 1, the same as Stephen’s car make and model. This is what you want since Stephen’s clone has the same car preferences as Stephen!

Let’s skip to the last line of the output. Stephen’s car is no longer the exact same car as the clone’s car. This is different from the result you got with the shallow copy above. The clone’s car is a different instance of Car. So there are two cars now; one belongs to Stephen and the other to the clone.

However, the clone’s car already has 100 miles on the odometer even though the clone hasn’t driven yet. When you create a deep copy of stephen, the program creates a new instance of Car. However, all of the original car attributes are also copied. This means the clone’s car starts with whatever mileage Stephen’s car has when you create the deep copy.

From now on, the two cars are separate, so when the clone drives the car, the additional mileage won’t show up in Stephen’s car:

# cloning_stephen.py

# ...

# Clone goes for a drive:
clone.drive(68)

print(f"Stephen's mileage is {stephen.car.mileage} miles")
print(f"The clone's mileage is {clone.car.mileage} miles")

The output shows that Stephen’s mileage is still 100 miles, but the clone’s mileage is now 168 miles even though his one and only only trip is 68 miles long:

...
Stephen's mileage is 100 miles
The clone's mileage is 168 miles

In the last section of this article, you’ll fix this to customise how an instance of Person should be copied.

Defining The __copy__ Dunder Method

You can override the default behaviour for copy.copy() and copy.deepcopy() for any class you define. In this article, I’ll only focus on defining the dunder method __copy__(), which determines what happens when you call copy.copy() for your object. There’s also a __deepcopy__() dunder method, aimed at creating deep copies, which is similar but provides a bit more functionality to deal with complex objects.

You can return to household.py where you define the class Person and add __copy__() to the class:

# household.py

class Car:
    def __init__(self, make: str, model: str):
        self.make = make
        self.model = model
        self.mileage = 0

    def add_mileage(self, miles: float):
        self.mileage += miles

class Person:
    def __init__(self, firstname: str):
        self.firstname = firstname
        self.car = None

    def buy_car(self, car: Car):
        self.car = car

    def drive(self, miles: float):
        self.car.add_mileage(miles)

    def __copy__(self):
        copy_instance = Person(self.firstname)
        copy_instance.buy_car(
            Car(
                make=self.car.make,
                model=self.car.model,
            )
        )
        return copy_instance

The __copy__() dunder method creates a new Person instance using the same first name of the instance you’re copying. It also creates a new Car instance using the make and model of the car you’re copying. You pass this new Car object as an argument in copy_instance.buy_car() and then return the new Person instance.

You can return to cloning_stephen.py, making sure you use copy.copy() to make a copy of stephen. This means that Person.__copy__() is used when creating the copy.

# cloning_stephen.py

import copy

from household import Car, Person

# Create a person who buys a car
stephen = Person("Stephen")
stephen.buy_car(
    Car("BMW", "Series 1")
)

# Log how many miles driven
stephen.drive(100)

print(f"Stephen's mileage is {stephen.car.mileage} miles")

# Let's copy the Person instance
clone = copy.copy(stephen)

print(
    f"The clone's car is a {clone.car.make} {clone.car.model}"
)

print(f"The clone's mileage is {clone.car.mileage} miles")

# Let's check whether the two cars are exactly the same car
print(
    f"Stephen's car is clone's car: {stephen.car is clone.car}"
)

Now, the output is:

Stephen's mileage is 100 miles
The clone's car is a BMW Series 1
The clone's mileage is 0 miles
Stephen's car is clone's car: False

The clone still has a different instance of Car but now, the car’s mileage starts at 0, as you’d expect! You’ve created a custom version of shallow copy by defining __copy__() for the class. In this case, you decided that when you copy a Person, the new instance has its own car which starts with 0 miles.

In more complex classes, you may want to define both __copy__() and __deepcopy__() if you want to distinguish between shallow and deep copy in your Python program.

Final Words

Here’s a summary of the key points you covered in this article:

  • You created copies of simple lists and other data structures
  • You created copies of more complex lists
  • You used the copy built-in module
  • You learnt about the difference between shallow and deep copy in Python
  • You used __copy__() to define how to shallow copy an object of a user-defined class

You’re now ready to safely copy any object, knowing what to look out for if the object references other objects.

Appendix: You Cannot Copy An Immutable Object

Do you recall when you used copy.copy() on a tuple earlier in the article? Unlike when you copy lists and dictionaries, where you got a new instance containing the same values as the original, you got the same instance back when you try to copy a tuple.

Whenever you pass an immutable object to copy.copy(), it returns the object itself.

Further Reading

You can read more about Object-Oriented Programming here:


Get the latest blog updates

No spam promise. You’ll get an email when a new blog post is published


The post Shallow and Deep Copy in Python and How to Use __copy__() appeared first on The Python Coding Book.


Viewing all articles
Browse latest Browse all 22466

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>