Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Vasudev Ram: Exploring sizes of data types in Python

$
0
0
By Vasudev Ram

I was doing some experiments in Python to see how much of various data types could fit into the memory of my machine. Things like creating successively larger lists of integers (ints), to see at what point it ran out of memory.

At one point, I got a MemoryError while trying to create a list of ints that I thought should fit into memory. Sample code:
>>> lis = range(10 ** 9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
After thinking a bit, I realized that the error was to be expected, since data types in dynamic languages such as Python tend to take more space than they do in static languages such as C, due to metadata, pre-allocation (for some types) and interpreter book-keeping overhead.

And I remembered the sys.getsizeof() function, which shows the number of bytes used by its argument. So I wrote this code to display the types and sizes of some commonly used types in Python:
from __future__ import print_function
import sys

# data_type_sizes_w_list_comp.py
# A program to show the sizes in bytes, of values of various
# Python data types.`

# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram - https://vasudevram.github.io

#class Foo:
class Foo(object):
pass

def gen_func():
yield 1

def setup_data():
a_bool = bool(0)
an_int = 0
a_long = long(0)
a_float = float(0)
a_complex = complex(0, 0)
a_str = ''
a_tuple = ()
a_list = []
a_dict = {}
a_set = set()
an_iterator = iter([1, 2, 3])
a_function = gen_func
a_generator = gen_func()
an_instance = Foo()

data = (a_bool, an_int, a_long, a_float, a_complex,
a_str, a_tuple, a_list, a_dict, a_set,
an_iterator, a_function, a_generator, an_instance)
return data

data = setup_data()

print("\nPython data type sizes:\n")

header = "{} {} {}".format(\
"Data".center(10), "Type".center(15), "Length".center(10))
print(header)
print('-' * 40)

rows = [ "{} {} {}".format(\
repr(item).center(10), str(type(item)).center(15), \
str(sys.getsizeof(item)).center(10)) for item in data[:-4] ]
print('\n'.join(rows))
print('-' * 70)

rows = [ "{} {} {}".format(\
repr(item).center(10), str(type(item)).center(15), \
str(sys.getsizeof(item)).center(10)) for item in data[-4:] ]
print('\n'.join(rows))
print('-' * 70)
(I broke out the last 4 objects above into a separate section/table, since the output for them is wider than for the ones above them.)

Although iterators, functions, generators and instances (of classes) are not traditionally considered as data types, I included them as well, since they are all objects (see: almost everything in Python is an object), so they are data in a sense too, at least in the sense that programs can manipulate them. And while one is not likely to create tens of thousands or more of objects of these types (except maybe class instances [1]), it's interesting to have an idea of how much space instances of them take in memory.

[1] As an aside, if you have to create thousands of class instances, the flyweight design pattern might be of help.

Here is the output of running the program with:
$ python data_type_sizes.py

Python data type sizes:
----------------------------------------
Data Type Length
----------------------------------------
False <type 'bool'> 12
0 <type 'int'> 12
0L <type 'long'> 12
0.0 <type 'float'> 16
0j <type 'complex'> 24
'' <type 'str'> 21
() <type 'tuple'> 28
[] <type 'list'> 36
{} <type 'dict'> 140
set([]) <type 'set'> 116
----------------------------------------------------------------------

----------------------------------------------------------------------
<listiterator object at 0x021F0FF0> <type 'listiterator'> 32
<function gen_func at 0x021EBF30> <type 'function'> 60
<generator object gen_func at 0x021F6C60> <type 'generator'> 40
<__main__.Foo object at 0x022E6290> <class '__main__.Foo'> 32
----------------------------------------------------------------------

[ When I used the old-style Python class definition for Foo (see the comment near the class keyword in the code), the output for an_instance was this instead:
<__main__.Foo instance at 0x021F6C88> <type 'instance'> 36
So old-style class instances actually take 36 bytes vs. new-style ones taking 32.
]

We can draw a few deductions from the above output.

- bool is a subset of the int type, so takes the same space - 12 bytes.
- float takes a bit more space than long.
- complex takes even more.
- strings and the data types below it in the first table above, have a fair amount of overhead.

Finally, I first wrote the program with two for loops, then changed (and slightly shortened) it by using the two list comprehensions that you see above - hence the file name data_type_sizes_w_list_comp.py :)

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Signup to hear about my new courses and products.

My Python posts     Subscribe to my blog by email

My ActiveState recipes


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>