Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 24356

Python Morsels: Unicode character encodings

$
0
0

When working with text files in Python, it's considered a best practice to specify the character encoding that you're working with.

All input starts as raw bytes

When you open a file in Python, the default mode is r or rt, for read text mode:

>>> withopen("my_file.txt")asf:... contents=f.read()...>>> f.mode'r'
>>> withopen("my_file.txt")asf:... contents=f.read()...>>> f.mode'r'

Meaning when we read our file, we'll get back strings that represent text:

>>> contents'This is a file ✨\n'
>>> contents'This is a file ✨\n'

But that's not what Python actually reads from disk.

If we open a file with the mode rb and read from our file we'll see what Python sees; that is bytes:

>>> withopen("my_file.txt",mode="rb")asf:... contents=f.read()...>>> contentsb'This is a file \xe2\x9c\xa8\n'>>> type(contents)<class 'bytes'>
>>> withopen("my_file.txt",mode="rb")asf:... contents=f.read()...>>> contentsb'This is a file \xe2\x9c\xa8\n'>>> type(contents)<class 'bytes'>

Bytes are what Python decodes to make strings.

Encoding strings into bytes

If you have a string …

Read the full article: https://www.pythonmorsels.com/unicode-character-encodings-in-python/


Viewing all articles
Browse latest Browse all 24356

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>