By Vasudev Ram
I was doing some experiments in Python to see how much of various data types could fit into the memory of my machine. Things like creating successively larger lists of integers (ints), to see at what point it ran out of memory.
At one point, I got a MemoryError while trying to create a list of ints that I thought should fit into memory. Sample code:
And I remembered the sys.getsizeof() function, which shows the number of bytes used by its argument. So I wrote this code to display the types and sizes of some commonly used types in Python:
Although iterators, functions, generators and instances (of classes) are not traditionally considered as data types, I included them as well, since they are all objects (see: almost everything in Python is an object), so they are data in a sense too, at least in the sense that programs can manipulate them. And while one is not likely to create tens of thousands or more of objects of these types (except maybe class instances [1]), it's interesting to have an idea of how much space instances of them take in memory.
[1] As an aside, if you have to create thousands of class instances, the flyweight design pattern might be of help.
Here is the output of running the program with:
[ When I used the old-style Python class definition for Foo (see the comment near the class keyword in the code), the output for an_instance was this instead:
<__main__.Foo instance at 0x021F6C88> <type 'instance'> 36
So old-style class instances actually take 36 bytes vs. new-style ones taking 32.
]
We can draw a few deductions from the above output.
- bool is a subset of the int type, so takes the same space - 12 bytes.
- float takes a bit more space than long.
- complex takes even more.
- strings and the data types below it in the first table above, have a fair amount of overhead.
Finally, I first wrote the program with two for loops, then changed (and slightly shortened) it by using the two list comprehensions that you see above - hence the file name data_type_sizes_w_list_comp.py :)
- Enjoy.
- Vasudev Ram - Online Python training and consultingSignup to hear about my new courses and products.My Python posts Subscribe to my blog by emailMy ActiveState recipes
I was doing some experiments in Python to see how much of various data types could fit into the memory of my machine. Things like creating successively larger lists of integers (ints), to see at what point it ran out of memory.
At one point, I got a MemoryError while trying to create a list of ints that I thought should fit into memory. Sample code:
>>> lis = range(10 ** 9)After thinking a bit, I realized that the error was to be expected, since data types in dynamic languages such as Python tend to take more space than they do in static languages such as C, due to metadata, pre-allocation (for some types) and interpreter book-keeping overhead.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
And I remembered the sys.getsizeof() function, which shows the number of bytes used by its argument. So I wrote this code to display the types and sizes of some commonly used types in Python:
from __future__ import print_function(I broke out the last 4 objects above into a separate section/table, since the output for them is wider than for the ones above them.)
import sys
# data_type_sizes_w_list_comp.py
# A program to show the sizes in bytes, of values of various
# Python data types.`
# Author: Vasudev Ram
# Copyright 2016 Vasudev Ram - https://vasudevram.github.io
#class Foo:
class Foo(object):
pass
def gen_func():
yield 1
def setup_data():
a_bool = bool(0)
an_int = 0
a_long = long(0)
a_float = float(0)
a_complex = complex(0, 0)
a_str = ''
a_tuple = ()
a_list = []
a_dict = {}
a_set = set()
an_iterator = iter([1, 2, 3])
a_function = gen_func
a_generator = gen_func()
an_instance = Foo()
data = (a_bool, an_int, a_long, a_float, a_complex,
a_str, a_tuple, a_list, a_dict, a_set,
an_iterator, a_function, a_generator, an_instance)
return data
data = setup_data()
print("\nPython data type sizes:\n")
header = "{} {} {}".format(\
"Data".center(10), "Type".center(15), "Length".center(10))
print(header)
print('-' * 40)
rows = [ "{} {} {}".format(\
repr(item).center(10), str(type(item)).center(15), \
str(sys.getsizeof(item)).center(10)) for item in data[:-4] ]
print('\n'.join(rows))
print('-' * 70)
rows = [ "{} {} {}".format(\
repr(item).center(10), str(type(item)).center(15), \
str(sys.getsizeof(item)).center(10)) for item in data[-4:] ]
print('\n'.join(rows))
print('-' * 70)
Although iterators, functions, generators and instances (of classes) are not traditionally considered as data types, I included them as well, since they are all objects (see: almost everything in Python is an object), so they are data in a sense too, at least in the sense that programs can manipulate them. And while one is not likely to create tens of thousands or more of objects of these types (except maybe class instances [1]), it's interesting to have an idea of how much space instances of them take in memory.
[1] As an aside, if you have to create thousands of class instances, the flyweight design pattern might be of help.
Here is the output of running the program with:
$ python data_type_sizes.py
Python data type sizes:
----------------------------------------
Data Type Length
----------------------------------------
False <type 'bool'> 12
0 <type 'int'> 12
0L <type 'long'> 12
0.0 <type 'float'> 16
0j <type 'complex'> 24
'' <type 'str'> 21
() <type 'tuple'> 28
[] <type 'list'> 36
{} <type 'dict'> 140
set([]) <type 'set'> 116
----------------------------------------------------------------------
----------------------------------------------------------------------
<listiterator object at 0x021F0FF0> <type 'listiterator'> 32
<function gen_func at 0x021EBF30> <type 'function'> 60
<generator object gen_func at 0x021F6C60> <type 'generator'> 40
<__main__.Foo object at 0x022E6290> <class '__main__.Foo'> 32
----------------------------------------------------------------------
[ When I used the old-style Python class definition for Foo (see the comment near the class keyword in the code), the output for an_instance was this instead:
<__main__.Foo instance at 0x021F6C88> <type 'instance'> 36
So old-style class instances actually take 36 bytes vs. new-style ones taking 32.
]
We can draw a few deductions from the above output.
- bool is a subset of the int type, so takes the same space - 12 bytes.
- float takes a bit more space than long.
- complex takes even more.
- strings and the data types below it in the first table above, have a fair amount of overhead.
Finally, I first wrote the program with two for loops, then changed (and slightly shortened) it by using the two list comprehensions that you see above - hence the file name data_type_sizes_w_list_comp.py :)
- Enjoy.
- Vasudev Ram - Online Python training and consultingSignup to hear about my new courses and products.My Python posts Subscribe to my blog by emailMy ActiveState recipes