Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 23027

Artem Golubin: How Python saves memory when storing strings

$
0
0

Since Python 3, the str type uses Unicode representation. Unicode strings can take up to 4 bytes per character depending on the encoding, which sometimes can be expensive from a memory perspective.

To reduce memory consumption and improve performance, Python uses three kinds of internal representations for Unicode strings:

  • 1 byte per char (Latin-1 encoding)
  • 2 bytes per char (UCS-2 encoding)
  • 4 bytes per char (UCS-4 encoding)

When programming in Python all strings behave the same, and most of the time we don't notice any difference. However, the difference can be very remarkable and sometimes unexpected when working with large amounts of text.

To see the difference in internal representations, we can use the sys.getsizeof function, which returns the size of an object in bytes:

>>>importsys>>>string='hello'>>>sys.getsizeof(string)54>>># 1-byte encoding>>>

Viewing all articles
Browse latest Browse all 23027

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>